Nucleic acids and proteins with thioredoxin reductase activity

ABSTRACT

The present invention relates to the use of a variety of methods for generating functional thioredoxin reductase variants in which at least one physical, chemical or biological property of the variant is altered in a specific and desired manner when compared to the wild-type protein.

[0001] This application claims the benefit of the filing date of U.S.Ser. No. 60/289,029, filed May 4, 2001, U.S. Ser. No. 60/370,609, filedApr. 5, 2002, and the provisional application by Desjarlais and Muchhal,entitled “Novel Nucleic Acids and Proteins with Thioredoxin ReductaseActivity”, filed Apr. 29, 2002, serial number not assigned.

FIELD OF THE INVENTION

[0002] The present invention relates to the use of a variety of methodsfor generating functional thioredoxin reductase variants in which atleast one physical, chemical or biological property of the variant isaltered in a specific and desired manner when compared to the wild-typeprotein.

BACKGROUND OF THE INVENTION

[0003] Thioredoxin, a small dithiol protein, is a specific reductant formajor food proteins, allergenic proteins and particularly allergenicproteins present in widely used foods from animal and plant sources.Most proteins having disulfide (S—S) bonds are reduced to the sulfhydryl(SH) level by thioredoxin. These proteins are allergenically active andless digestible in the oxidized (S—S) state. When reduced (SH state),they lose their allergenicity and/or become more digestible. Ofimportance is the thioredoxin reduction of disulfide bonds in proteinssuch as albumins, globulins, gliadins, thionins, and the glutenins foundin many seeds and cereals, and also a number of proteins found in milk.See, for example, Kiss, F. et al. (1991), Arch. Biochem. Biophys.287:337-340; Johnson, T. C. et al. (1987), Plant Physiol. 85:446-451;Kasarda D. D. et al. (1976), Adv. Cer. Sci. Tech. 1:158-236; and OsborneT. B. et al. (1893), Amer. Chem. J. 15:392-471; Shewry, P. R. et al.(1985), Adv. Cer. Sci. Tech. 7:1-83; Dahle, L. K. et al. (1966), CerealChem. 43:682-688; Garcia-Olmedo, F. et al. (1987), Oxford Surveys ofPlant Molecular and Cell Biology 4:275-335; Birk, Y. (1976), Meth.Enzymol. 45:695-739, and Laskowski, M., Jr. et al. (1980), Ann. Reo.Biochem. 49:593-626; Weselake, R. J. et al. (1983), Plant Physiol.72:809-812; Birk, Y. (1985), Int. J. Peptide Protein Res. 25:113-131,and Birk, Y. (1976), Meth. Enzymol. 45:695-739; Birk, Y. (1985), Int. J.Peptide Protein Res. 25:113-131.

[0004] In addition, thioredoxin reduces the disulfide bonds in manytoxic proteins, such as those found in snakes (Yang, C. C. (1967)Biochim. Biophys. Acta. 133:346-355; Howard, B. D. et al. (1977)Biochemistry 16:122-125), bees, scorpions (Watt, D. D. et al. (1972)Toxicon 10:173-181), the bacterial neurotoxins tetanus and botulinum(Schiavo, G. et al. (1990) Infection and Immunity 58:4136-4141; Kistner,A. et al. (1992) Naunyn-Schmiedeberg's Arch Pharmacol 345:227-234), andthereby reduces or in some instances eliminates their toxicityaltogether.

[0005] Thioredoxin achieves this reduction when activated (reduced)either by nicotinamide adenine dinucleotide phosphate (NADPH) viaNADP-thioredoxin reductase (physiological conditions) or bydithiothreitol, a chemical reductant. See, for example, U.S. Pat. No.5,952,034, incorporated herein by references in its entirety. Skin testsand feeding experiments carried out with sensitized dogs have shown thattreatment of the food with reduced thioredoxin prior to ingestioneliminates or decreases the allergenicity of the food. Studies have alsoshown increased digestion of food and food proteins by pepsin andtrypsin following reduction by thioredoxin.

[0006] Thus, it would be deirable to develop an efficient, low costmethod of using thioredoxin reductase to reduce the toxicity of toxicproteins, reduce the allergenicty of food, and increase thedigestibility of food.

SUMMARY OF THE INVENTION

[0007] In accordance with the objects outlined above, the presentinvention provides a method for altering the cofactor specificity ofthioredoxin reductase comprising imputing a set of coordinates for athioredoxin reductase scaffold protein comprising amino acid positions;applying at least one protein design cycle, and generating a set ofcandidate variant proteins with altered cofactor dependency. Preferably,the scaffold protein is selected from the group of organisms consistingof E. coli, Bacillus subtillis, Mycobacterium leprae, Sarccharomyces,Neurospora crassa, Arabidopsis, and human.

[0008] In an additional aspect, the cofactor specificity of the variantTR protein is NADPH or NADH. Perferably, the cofactor specificity isswitched to NADH. In addition, other TR variants are generated thatpreferentially bind NADPH compared to NADH, preferentially bind NADNcompared to NADPH, bind both cofactors equally. In other embodiments,the catalytic efficiency for one or the other cofactors or both isaltered.

[0009] In an additional aspect the variant TR proteins have amino acidsubstitutitons selected from the group of substitutions consisting ofRA4W, RA5L, R A5M, R A5I, R A5F, R A5V, R A5Y, RA5A, RA5S, RA5C, RA5T,RA6T, R A6S, R A6Q, R A6G, and R A6N, RA6D, RA6M, and RA6E.

[0010] In an additional aspect, the present invention provides a methodfor altering the substrate specificity of TR protein comprising inputinga set of coordinates for a thioredoxin reductase scaffold proteincomprising amino acid positions; applying at least one protein designcycle, and generating a set of candidate variant proteins with alteredsubstrate specificity.

[0011] In an additional aspect, the present invention provides a methodfor altering the cofactor specificity of a target protein comprisinginputing a set of coordinates for a thioredoxin reductase scaffoldprotein comprising amino acid positions; applying at least one proteindesign cycle, and generating a set of candidate variant proteins withaltered cofactor specificity.

[0012] In an additional embodiment, the present invention provides avariant thioredoxin reductase (TR) protein comprising an isolatedpolypeptide molecule of Formula I

S₁-A₁-A₂-S₂-A₃-A₄-A₅-S₃-A₆-S₄  (I)

[0013] wherein

[0014] a) S₁ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:5, SEQ ID NO:6, and SEQ ID NO:7, or a sequence having substantialsimilarity thereto;

[0015] b) S₂ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQID NO:12, SEQ ID NO:13, and SEQ ID NO:14, or a sequence havingsubstantial similarity thereto;

[0016] c) S₃ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18,SEQ ID NO:19, SEQ ID NO:20, and SEQ ID NO:21, or a sequence havingsubstantial similarity thereto;

[0017] d) S₄ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25,SEQ ID NO:26, SEQ ID NO:27, and SEQ ID NO:28, or a sequence havingsubstantial similarity thereto;

[0018] e) A₁ is an amino acid moiety selected from the group consistingof serine, valine, glycine, alanine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0019] f) A₂ is an amino acid moiety selected from the group consistingof alanine, glycine, valine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0020] g) A₃ is an amino acid moiety selected from the group consistingof histidine, aspartic acid, glutamic acid, arginine, leucine, serine,threonine, cysteine, asparagine, glutamine, and tyrosine;

[0021] h) A₄ is an amino acid moiety selected from the group consistingof arginine, alanine, glycine, valine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0022] i) A₅ is an amino acid moiety selected from the group consistingof arginine, asparagine, glutamine, aspartic acid, glutamic acid,cysteine, serine, threonine, and lysine;

[0023] j) A₆ is an amino acid moiety selected from the group consistingof arginine, glutamic acid, asparagine, glutamine, aspartic acid,cysteine, serine, threonine, and lysine; provided that at least

[0024] A₁ is not serine;

[0025] A₂ is not alanine;

[0026] A₃ is not histidine;

[0027] A₄ is not arginine;

[0028] A₅ is not arginine; or

[0029] A₆ is not arginine.

[0030] In an additional aspect, the present invention provides a methodfor altering the oil content of plant cells comprising introducing anexpression cassette comprising a promoter functional in a plant celloperably linked to a DNA molecule encoding a modified thioreduxinreductase (TR) protein according to claim 1 or 22 comprising an aminoterminal chloroplast transit peptide, into the cells of a plant so as toyield transformed plant cells; and regenerating said transformed plantcells to provide a differentiated transformed plant, wherein expressionof the DNA molecule encoding the modified TR protein in said plantalters the co-factor specificity compared to the untransformed plant.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031]FIG. 1 depicts the reaction catalyzed by thioredoxin reductases.

[0032]FIG. 2 depicts the active site pocket of reductases from a numberof species is highly conserved.

[0033]FIG. 2A lists some of the most common TR sequences. The firstcolumn lists the Genbank ID number, A1 through A6 refers to the aminoacids defined in Formula I (described below), S2 and S3 are sequencedomains separating A1 through A6 and are also defined in Formula I.

[0034]FIG. 2B lists some of the common glutathione reductase sequences.

[0035]FIGS. 2C and 2D represent the natural sequence diversity at eachof the defined positions grouped according to organism.

[0036]FIG. 2E lists known cofactor specificity and known amino acidplacement.

[0037]FIG. 3 depicts various sequences that may be used in Formula I.

[0038]FIG. 4 provides an overview of the high throughput TR screeningmethods.

[0039]FIG. 5 depicts protein purification strategies.

[0040]FIG. 6 depicts the kinetics of Arabidopsis NTR wild-type reductasewith NAD(P)H.

[0041]FIG. 7 depicts variants obtained from the NTR-1 Library 1.

[0042]FIG. 8 depicts variants obtained from the NTR-1 Library 2.

[0043]FIGS. 9A and 9B depict the designed positions and the dockedco-factor from NTR-1 Library 1 and NTR-1 Library 2.

[0044]FIG. 10 depicts the summary of results from the screening ofvariants from 4 computational libraries.

[0045]FIGS. 11A and B depict the kinetic parameters for 2 variantsversus wild-type TR.

[0046]FIG. 12 depicts a summary of the best variants obtained from theNTR-1 library 2 design.

[0047]FIGS. 13A and B summarize the activity of variants obtained from ahigh complexity random RRR library. A summary of the variants obtainedfrom this library is found in FIG. 13C.

[0048]FIG. 14 depicts a computational model for two of the clones.

[0049]FIG. 15 summaries the enzymatic activities and kinetic parametersfor some of the variants.

[0050]FIG. 16A depicts the nucleic acid sequence for the WVR variant.

[0051]FIG. 16B depicts the nucleic acid sequence for the WMG variant.

[0052]FIG. 16C depicts the nucleic acid sequence for the WIS variant.

[0053]FIG. 16D depicts the nucleic acid sequence for the WMS variant.

[0054]FIG. 16E depicts the nucleic acid sequence for the WLS variant.

[0055]FIG. 16F depicts the nucleic acid sequence for the WRT variant.

[0056]FIG. 16G depicts the nucleic acid sequence for the RYN variant.

[0057]FIG. 16H depicts the nucleic acid sequence for the RYN-A variant.

[0058]FIG. 16I depicts the nucleic acid sequence for the RFN variant.

[0059]FIG. 16J depicts the RRR-WT nucleic acid sequence.

[0060]FIG. 16K depicts the nucleic acid sequence for the WVG variant.

[0061]FIG. 16L depicts the nucleic acid sequence for the WRS variant.

[0062]FIG. 16M depicts the nucleic acid sequence for the WFQ variant.

[0063]FIG. 16N depicts the nucleic acid sequence for the NTR wild-typeprotein.

[0064]FIG. 16O depicts the nucleic acid sequence for the RYN-M variant.

[0065]FIG. 16P depicts the nucleic acid sequence for the RYN-L variant.

[0066]FIG. 16Q depicts the nucleic acid sequence for the RYN-I variant.

[0067]FIGS. 17A and B depict the alignment of the Arabidopsis NTRwild-type protein with several of the variants.

[0068]FIG. 18 is a computational representation of the critical RRR toRYN change described in Example 1.

[0069]FIG. 19 depicts a small sample of NAD conformations culled fromthe protein databank. The ball-and-stick model is the NAD_TDF conformer,which has a different ribose pucker than most of the others.

[0070]FIG. 20 depicts the library postions utilized in PDA simulationsand generation of libraries 1 and 2.

[0071]FIG. 21 depicts the sequence alignment of several wild-type TRproteins. Sequences correspond to the following: 1) |P09625|TRXB_(—)ECOLI; 2) |P80880|TRXB_BACSU; 3) |P46843|TRXB_MYCLE; 4)|P51978|TRXB_NEUCR; 5) |P29509|TRB1_YEAST; 6) |P38816|TRB2_YEAST; 7)|Q39243|TRB1_ARATH; 8) |Q39242|TRB2_ARATH; and, 9) |Q16881|TRXB_HUMAN.

[0072]FIG. 22 depicts the amino acid sequences of several wild-type TRproteins. Sequences correspond to the following: A) |P09625|TRXB_(—)ECOLI; B) |P80880|TRXB_BACSU; C) |P46843|TRXB_MYCLE; D)|P51978|TRXB_NEUCR; E) |P29509|TRB1_YEAST; F) |P38816|TRB2_YEAST; G)|Q39243|TRB1_ARATH; H) |Q39242|TRB2_ARATH; and, I) |Q16881|TRXB_HUMAN.

DETAILED DESCRIPTION OF THE INVENTION

[0073] The present invention is directed to the generation of variantproteins and nucleic acids that exhibit altered cofactor specificity.The variant proteins may be generated using a number of differentapproaches, such as conventional mutagenesis approaches andcomputational processing approaches. Computational processing approacheshave been previously described in U.S. Pat. Nos. 6,188,965 and6,296,312, U.S. Ser. Nos. 09/419,351, 09/782,004, 09/927,79, and09/877,695; all of which are expressly incorporated herein by referencein their entirety. In general, these applications describe a variety ofcomputational modeling systems that allow the generation of extremelystable proteins. In this way, variants of wild-type proteins aregenerated that exhibit altered cofactor specificity as compared towild-type proteins.

[0074] The methods of the present invention can be applied to any enzymethat exhibits a preference for one cofactor over another. For example,enzyme reductases often exhibit a preference for one cofactor versusanother. In addition, the methods of the present invention can beapplied to change the substrate specificity of a target protein.

[0075] In particular, the methods of the present invention can be usedto change the cofactor preference from NADPH to NADH. NADPH is anexpensive reductant. Its expense has prohibited the wide use ofthioredoxin systems in reducing food allergens and venom treatments.Thus, there is a need in the art to find other systems that achieve thesame results as the use of NADP-thioredoxin reductase reductants but atlower costs. One such system, would be to generate variants ofthioredoxin reductase with altered cofactor specificity.

[0076] According the present invention provides methods for altering thecofactor specificity of a target protein. By “altering” herein orgrammatical equivalents thereof in the context of a polypeptide, as usedherein, further refers to any characteristic or attribute of apolypeptide that can be selected or detected and compared to thecorresponding property of a naturally occurring protein. Theseproperties include, but are not limited to cofactor specificity,cytotoxic activity; oxidative stability, substrate specificity,substrate binding or catalytic activity, thermal stability, alkalinestability, pH activity profile, resistance to proteolytic degradation,kinetic association (K_(on)) and dissociation (K_(off)) rate, proteinfolding, inducing an immune response, ability to bind to a ligand,ability to bind to a receptor, ability to be secreted, ability to bedisplayed on the surface of a cell, ability to oligomerize, ability tosignal, ability to stimulate cell proliferation, ability to inhibit cellproliferation, ability to induce apoptosis, ability to be modified byphosphorylation or glycosylation, ability to treat disease.

[0077] Unless otherwise specified, a substantial change in any of theabove-listed properties, when comparing the property of a variantpolypeptide of the present invention to the property of a target proteinor wild-type protein is preferably at least a 20%, more preferably, 50%,more preferably at least a 2-fold increase or decrease.

[0078] By “cofactor specificity” herein is meant changing the cofactorpreference of an enzyme. By “cofactor” herein is meant coenzymes, suchas NADPH, NADH, that participate in oxidation/reduction reactions. Thus,if a target protein exhibits a preference for one cofactor over another,the methods of the present invention may be used to alter the cofactorpreference of the target enzyme, such that the preference for the lessfavored cofactor is increased by 20%, 50%, 100%, 300%, 500%, 1000%, upto 2000%. For example, a number of reductase enzymes favor NADPH overNADH (see WO 02/22526; WO 02.29019; Mittl, P R., et al., (1994) ProteinSci., 3: 1504-14; Banta, S., et al., (2002) Protein Eng., 15:131-140;all of which are hereby incorporated by reference in their entirety). Asthe availability of NADPH is often limiting, both in vivo and in vitro,the overall activity of target protein is often limited. For targetproteins that prefer NADPH as a cofactor, it would be desirable to alterthe cofactor specificity of the target protein to a cofactor that ismore readily available, such as NADH.

[0079] In a preferred embodiment, the cofactor specificity of the targetprotein is switched. By “switched” herein is meant, that the cofactorpreference (e.g. affinity) of a target protein is changed to anothercofactor. Preferably, in one embodiment, by switching cofactorspecificity, activity with the cofactor preferred by the wild-typeenzyme is reduced, while the activity with the less preferred cofactoris increased. For example, if a target protein prefers NADPH, switchingthe preference to NADH would result in the variant TR having at least50% of native NADPH dependent activity using NADH. More preferably, thevariant TRs wil have at least 75% of native NADPH dependent activityusing NADH, More preferably the variant TRs will have 85%, 95%, up to100% of native NADPH activity using NADH. Alternatively, in anotherembodiment, the alternate cofactor affinity is increased without adecrease in preferred cofactor affinity. In yet other embodiments, thecofactor affinity for both factors is changed simultaneously.

[0080] In a preferred, the catalytic efficiency of the target proteinfor a cofactor is enhanced. By “catalytic efficiency” herein is meantthe activity with the cofactor is significantly improved. Catalyticefficiency may be improved for either the preferred cofactor or, inthose embodiments where the cofactor specificity is altered thecatalytic efficiency with the altered cofactor may be improved.

[0081] In a preferred embodiment, the binding affinity of the targetprotein for a cofactor is enhanced. A change in binding affinity isevidenced by at least a 5% or greater increase or decrease in bindingaffinity compared to the wild-type target protein. In certainembodiments, variant proteins of the present invention may show greaterthan 100 times more affinity for one cofactor than for another, while inother embodiments the variant protein may show greater than 50 timesmore affinity for one cofactor than for another, or the variant proteinmay show greater than 25 times more affinity for one cofactor thananother.

[0082] In a preferred embodiment, the substrate specificity of thetarget protein is altered. For example, if a target protein typicallyacts on a substrate from the same species, the substrate specificity ofthe target protein may be changed such that the variant protein acts onsubstrates from other species. Accordingly, the present invention isdirected to methods for altering the cofactor specificity of targetprotein. By “target protein” or “scaffold protein” or grammaticalequivalents herein is meant at least two covalently attached aminoacids, which includes proteins, polypeptides, oligopeptides andpeptides. The protein may be made up of naturally occurring amino acidsand peptide bonds, or synthetic peptidomimetic structures, i.e.,“analogs” such as peptoids [see Simon et al., Proc. Natl. Acd. Sci.U.S.A. 89(20:9367-71 (1992)], generally depending on the method ofsynthesis. Thus “amino acid”, or “peptide residue”, as used herein meansboth naturally occurring and synthetic amino acids. For example,homo-phenylalanine, citrulline, and noreleucine are considered aminoacids for the purposes of the invention. “Amino acid” also includesimino acid residues such as proline and hydroxyproline. In addition, anyamino acid representing a component of the variant proteins of thepresent invention can be replaced by the same amino acid but of theopposite chirality. Thus, any amino acid naturally occurring in theL-configuration (which may also be referred to as the R or S, dependingupon the structure of the chemical entity) may be replaced with an aminoacid of the same chemical structural type, but of the oppositechirality, generally referred to as the D-amino acid but which canadditionally be referred to as the R- or the S-, depending upon itscomposition and chemical configuration. Such derivatives have theproperty of greatly increased stability, and therefore are advantageousin the formulation of compounds which may have longer in vivo halflives, when administered by oral, intravenous, intramuscular,intraperitoneal, topical, rectal, intraocular, or other routes. In thepreferred embodiment, the amino acids are in the (S) or L-configuration.If non-naturally occurring side chains are used, non-amino acidsubstituents may be used, for example to prevent or retard in vivodegradations. Proteins including non-naturally occurring amino acids maybe synthesized or in some cases, made recombinantly; see van Hest etal., FEBS Lett 428:(1-2) 68-70 May 22, 1998 and Tang et al., Abstr. PapAm. Chem. S218:U138-U138 Part 2 Aug. 22, 1999, both of which areexpressly incorporated by reference herein.

[0083] Aromatic amino acids may be replaced with D- or L-naphylalanine,D- or L-Phenylglycine, D- or L-2-thieneylalanine, D- or L-1-, 2-, 3- or4-pyreneylalanine, D- or L-3-thieneylalanine, D- orL-(2-pyridinyl)-alanine, D- or L-(3-pyridinyl)-alanine, D- orL-(2-pyrazinyl)-alanine, D- or L-(4-isopropyl)-phenylglycine,D-(trifluoromethyl)-phenylglycine, D-(trifluoromethyl)-phenylalanine,D-p-fluorophenylalanine, D- or L-p-biphenylphenylalanine, D- orL-p-methoxybiphenylphenylalanine, D- or L-2-indole(alkyl)alanines, andD- or L-alkylainines where alkyl may be substituted or unsubstitutedmethyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl, iso-butyl,sec-isotyl, iso-pentyl, non-acidic amino acids, of C1-C20. Acidic aminoacids can be substituted with non-carboxylate amino acids whilemaintaining a negative charge, and derivatives or analogs thereof, suchas the non-limiting examples of (phosphono)alanine, glycine, leucine,isoleucine, threonine, or serine; or sulfated (e.g., —SO.sub.3 H)threonine, serine, tyrosine. Other substitutions may include unnaturalhyroxylated amino acids may made by combining “alkyl” with any naturalamino acid. The term “alkyl” as used herein refers to a branched orunbranched saturated hydrocarbon group of 1 to 24 carbon atoms, such asmethyl, ethyl, n-propyl, isoptopyl, n-butyl, isobutyl, t-butyl, octyl,decyl, tetradecyl, hexadecyl, eicosyl, tetracisyl and the like. Alkylincludes heteroalkyl, with atoms of nitrogen, oxygen and sulfur.Preferred alkyl groups herein contain 1 to 12 carbon atoms. Basic aminoacids may be substituted with alkyl groups at any position of thenaturally occurring amino acids lysine, arginine, ornithine, citrulline,or (guanidino)-acetic acid, or other (guanidino)alkyl-acetic acids,where “alkyl” is define as above. Nitrile derivatives (e.g., containingthe CN-moiety in place of COOH) may also be substituted for asparagineor glutamine, and methionine sulfoxide may be substituted formethionine. Methods of preparation of such peptide derivatives are wellknown to one skilled in the art.

[0084] In addition, any amide linkage in any of the variant polypeptidescan be replaced by a ketomethylene moiety. Such derivatives are expectedto have the property of increased stability to degradation by enzymes,and therefore possess advantages for the formulation of compounds whichmay have increased in vivo half lives, as administered by oral,intravenous, intramuscular, intraperitoneal, topical, rectal,intraocular, or other routes. Additional amino acid modifications ofamino acids of variant polypeptides of to the present invention mayinclude the following: Cysteinyl residues may be reacted withalpha-haloacetates (and corresponding amines), such as 2-chloroaceticacid or chloroacetamide, to give carboxymethyl or carboxyamidomethylderivatives. Cysteinyl residues may also be derivatized by reaction withcompounds such as bromotrifluoroacetone,alpha-bromo-beta-(5-imidozoyl)propionic acid, chloroacetyl phosphate,N-alkylmaleimides, 3-nitro-2-pyridyl disulfide, methyl 2-pyridyldisulfide, p-chloromercuribenzoate, 2-chloromercuri-4-nitrophenol, orchloro-7-nitrobenzo-2-oxa-1,3-diazole. Histidyl residues may bederivatized by reaction with compounds such as diethylprocarbonate e.g.,at pH 5.5-7.0 because this agent is relatively specific for the histidylside chain, and para-bromophenacyl bromide may also be used; e.g., wherethe reaction is preferably performed in 0.1M sodium cacodylate at pH6.0. Lysinyl and amino terminal residues may be reacted with compoundssuch as succinic or other carboxylic acid anhydrides. Derivatizationwith these agents is expected to have the effect of reversing the chargeof the lysinyl residues. Other suitable reagents for derivatizingalpha-amino-containing residues include compounds such asimidoesters/e.g., as methyl picolinimidate; pyridoxal phosphate;pyridoxal; chloroborohydride; trinitrobenzenesulfonic acid;O-methylisourea; 2,4 pentanedione; and transaminase-catalyzed reactionwith glyoxylate. Arginyl residues may be modified by reaction with oneor several conventional reagents, among them phenylglyoxal,2,3-butanedione, 1,2-cyclohexanedione, and ninhydrin according to knownmethod steps. Derivatization of arginine residues requires that thereaction be performed in alkaline conditions because of the high pKa ofthe guanidine functional group. Furthermore, these reagents may reactwith the groups of lysine as well as the arginine epsilon-amino group.The specific modification of tyrosyl residues per se is well-known, suchas for introducing spectral labels into tyrosyl residues by reactionwith aromatic diazonium compounds or tetranitromethane. N-acetylimidizoland tetranitromethane may be used to form O-acetyl tyrosyl species and3-nitro derivatives, respectively. Carboxyl side groups (aspartyl orglutamyl) may be selectively modified by reaction with carbodiimides(R′—N—C—N—R′) such as 1-cyclohexyl-3-(2-morpholinyl-(4-ethyl)carbodiimide or 1-ethyl-3-(4-azonia4,4-dimethylpentyl) carbodiimide.

[0085] Furthermore aspartyl and glutamyl residues may be converted toasparaginyl and glutaminyl residues by reaction with ammonium ions.Glutaminyl and asparaginyl residues may be frequently deamidated to thecorresponding glutamyl and aspartyl residues. Alternatively, theseresidues may be deamidated under mildly acidic conditions. Either formof these residues falls within the scope of the present invention.

[0086] The target or scaffold protein may be any protein for which athree dimensional structure is known or can be generated; that is, forwhich there are three dimensional coordinates for each atom of theprotein. Generally this can be determined using X-ray crystallographictechniques, NMR techniques, de novo modeling, homology modeling, etc. Ingeneral, if X-ray structures are used, structures at 2 resolution orbetter are preferred, but not required.

[0087] The target or scaffold proteins of the present invention may befrom prokaryotes and eukaryotes, such as bacteria (includingextremeophiles such as the archebacteria), fungi, insects, fish, plants,and mammals. Suitable mammals include, but are not limited to, rodents(rats, mice, hamsters, guinea pigs, etc.), primates, farm animals(including sheep, goats, pigs, cows, horses, etc) and in the mostpreferred embodiment, from humans.

[0088] Thus, by “target protein” or “scaffold protein” herein is meant aprotein for which a variant protein or a library of variant proteins,preferably with altered cofactor specificity is desired. As will beappreciated by those in the art, any number of target proteins find usein the present invention. Specifically included within the definition of“protein” are fragments and domains of known proteins, includingfunctional domains such as enzymatic domains, binding domains, etc., andsmaller fragments, such as turns, loops, etc. That is, portions ofproteins may be used as well. In addition, “protein” as used hereinincludes proteins, oligopeptides and peptides. In addition, proteinvariants, i.e. non-naturally occurring protein analog structures, may beused.

[0089] Suitable proteins include, but are not limited to, industrial,pharmaceutical, and agricultural proteins. Suitable classes of enzymesinclude, but are not limited to, reductases, hydrolases such asproteases, carbohydrases, lipases; isomerases such as racemases,epimerases, tautomerases, or mutases; transferases, kinases,oxidoreductases, dehydrogenases, and phophatases. Suitable enzymes arelisted in the Swiss-Prot enzyme database. Suitable protein backbonesinclude, but are not limited to, all of those found in the protein database compiled and serviced by the Research Collaboratory for StructuralBioinformatics (RCSB, formerly the Brookhaven National Lab).

[0090] Specifically, preferred target protein include reductases, suchas thioredoxin reductase (US Pub. No. 2002/0037303),2,5-diketo-D-gluconic acid reductase (Banta, S, et al., (2002) ProteinEng., 15: 131-140; WO 02/22527; WO 02/29019), glutathione reductase(Mittl, P R, et al. (1993) J. Mol. Biol., 231: 191-5; Mittl & Schulz,(1994) Protein Sci., 3: 799-809; Mittl, P R, et al., (1994) ProteinSci., 3: 1504-14), the alkyl hydroperoxide reductase system (Wood, Z A,et al., (2001), Biochemistry, 40: 3900-3911), thioredoxin reductase-likeproteins (Reynolds, C M, et al., (2002) Biochemistry, 41: 1990-2001)

[0091] Accordingly, the present invention is directed to computationalprocessing methods for altering the cofactor specificity of the targetprotein. Once a set of coordinates for a target protein or scaffoldprotein is imported, a protein design cycle is implemented to generate aset of variable protein sequences with altered affinity for a desiredreceptor. By “protein design cycle” herein is meant any one of a numberof protein design algorithms that can be used to produce a sequence orsequence including but not limited to Protein Design Automation™ (PDA™),sequence prediction algorithm (SPA), various force field calculations,etc. See U.S. Pat. Nos. 6,188,965 and 6,296,312, U.S. Ser. Nos.09/419,351, 09/782,004, 09/927,79, 09/877,695; Raha, K., et al. (2000)Protein Sci., 9:1106-1119, U.S. Ser. No. 09/877,695, filed Jun. 8, 2001,entitled “Apparatus and Method for Designing Proteins and ProteinLibraries; U.S. Ser. Nos. 09/927,790, 60/352,103, and 60/351,937, all ofwhich are expressly incorporated herein by reference in their entirety.

[0092] In a preferred embodiment, the methods of the invention involvestarting with a target protein and use computational processing togenerate a candidate or variant protein or a set of primary sequences.In a preferred embodiment, sequence based methods are used.Alternatively, structure based methods, such as PDA™, described indetail below, are used. Other models for assessing the relative energiesof sequences with high precision include Warshel, Computer Modeling ofChemical Reactions in Enzymes and Solutions, Wiley & Sons, New York,(1991), hereby expressly incorporated by reference.

[0093] Similarly, molecular dynamics calculations can be used tocomputationally screen sequences by individually calculating mutantsequence scores and compiling a rank ordered list.

[0094] In a preferred embodiment, residue pair potentials can be used toscore sequences (Miyazawa et al., Macromolecules 18(3):534-552 (1985),expressly incorporated by reference) during computational screening.

[0095] In a preferred embodiment, sequence profile scores (Bowie et al.,Science 253(5016):164-70 (1991), incorporated by reference) and/orpotentials of mean force (Hendlich et al., J. Mol. Biol. 216(1):167-180(1990), also incorporated by reference) can also be calculated to scoresequences. These methods assess the match between a sequence and a 3Dprotein structure and hence can act to screen for fidelity to theprotein structure. By using different scoring functions to ranksequences, different regions of sequence space can be sampled in thecomputational screen.

[0096] Furthermore, scoring functions can be used to screen forsequences that would create metal or co-factor binding sites in theprotein (Hellinga, Fold Des. 3(1): R1-8 (1998), hereby expresslyincorporated by reference). Similarly, scoring functions can be used toscreen for sequences that would create disulfide bonds in the protein.These potentials attempt to specifically modify a protein structure tointroduce a new structural motif.

[0097] In a preferred embodiment, sequence and/or structural alignmentprograms can be used to generate the variant proteins of the invention.As is known in the art, there are a number of sequence-based alignmentprograms; including for example, Smith-Waterman searches,Needleman-Wunsch, Double Affine Smith-Waterman, frame search,Gribskov/GCG profile search, Gribskov/GCG profile scan, profile framesearch, Bucher generalized profiles, Hidden Markov models, Hframe,Double Frame, Blast, Psi-Blast, Clustal, and GeneWise.

[0098] The source of the sequences can vary widely, and include takingsequences from one or more of the known databases, including, but notlimited to, SCOP (Hubbard, et al., Nucleic Acids Res 27(1):254-256.(1999)); PFAM (Bateman, et al., Nucleic Acids Res 27(1):260-262.(1999)); VAST (Gibrat, et al., Curr Opin Struct Biol 6(3):377-385.(1996)); CATH (Orengo, et al., Structure 5(8):1093-1108. (1997)); PhDPredictor(http://www.embl-heidelberg.de/predictprotein/predictprotein.html);Prosite (Hofmann, et al., Nucleic Acids Res 27(1):215-219. (1999)); PIR(http://www.mips.biochem.mpg.de/proj/protseqdb/); GenBank(http://www.ncbi.nlm.nih.gov/); PDB (www.rcsb.org) and BIND (Bader, etal., Nucleic Acids Res 29(1):242-245. (2001)). In addition, sequencesfrom these databases can be subjected to contiguous analysis or geneprediction; see Wheeler, et al., Nucleic Acids Res 28(1):10-14. (2000)and Burge and Karlin, J Mol Biol 268(1):78-94. (1997).

[0099] As is known in the art, there are a number of sequence alignmentmethodologies that can be used. For example, sequence homology basedalignment methods can be used to create sequence alignments of proteinsrelated to the target structure (Altschul et al., J. Mol. Biol.215(3):403-410 (1990), Altschul et al., Nucleic Acids Res. 25:3389-3402(1997), both incorporated by reference). These sequence alignments arethen examined to determine the observed sequence variations. Thesesequence variations are tabulated to define a set of variant proteins.

[0100] Sequence based alignments can be used in a variety of ways. Forexample, a number of related proteins can be aligned, as is known in theart, and the “variable” and “conserved” residues defined; that is, theresidues that vary or remain identical between the family members can bedefined. These results can be used to generate a probability table, asoutlined below. Similarly, these sequence variations can be tabulatedand a secondary library defined from them as defined below.Alternatively, the allowed sequence variations can be used to define theamino acids considered at each position during the computationalscreening. Another variation is to bias the score for amino acids thatoccur in the sequence alignment, thereby increasing the likelihood thatthey are found during computational screening but still allowingconsideration of other amino acids. This bias would result in a focusedlibrary of variant proteins but would not eliminate from considerationamino acids not found in the alignment. In addition, a number of othertypes of bias may be introduced. For example, diversity may be forced;that is, a “conserved” residue is chosen and altered to force diversityon the protein and thus sample a greater portion of the sequence space.Alternatively, the positions of high variability between family members(i.e. low conservation) can be randomized, either using all or a subsetof amino acids. Similarly, outlier residues, either positional outliersor side chain outliers, may be eliminated.

[0101] Similarly, structural alignment of structurally related proteinscan be done to generate sequence alignments. There are a wide variety ofsuch structural alignment programs known. See for example VAST from theNCBI (http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml); SSAP(Orengo and Taylor, Methods Enzymol 266(617-635 (1996)) SARF2(Alexandrov, Protein Eng 9(9):727-732. (1996)) CE (Shindyalov andBourne, Protein Eng 11(9):739-747. (1998)); (Orengo et al., Structure5(8):1093-108 (1997); Dali (Holm et al., Nucleic Acid Res. 26(1):316-9(1998), all of which are incorporated by reference). These sequencealignments can then be examined to determine the observed sequencevariations. Libraries can be generated by predicting secondary structurefrom sequence, and then selecting sequences that are compatible with thepredicted secondary structure. There are a number of secondary structureprediction methods such as helix-coil transition theory (Munoz andSerrano, Biopolymers 41:495, 1997), neural networks, local structurealignment and others (e.g., see in Selbig et al., Bioinformatics15:1039-46, 1999).

[0102] Similarly, as outlined above, other computational methods areknown, including, but not limited to, sequence profiling [Bowie andEisenberg, Science 253(5016):164-70, (1991)], rotamer library selections[Dahiyat and Mayo, Protein Sci. 5(5):895-903 (1996); Dahiyat and Mayo,Science 278(5335):82-7 (1997); Desjarlais and Handel, Protein Science4:2006-2018 (1995); Harbury et al, Proc. Natl. Acad. Sci. U.S.A.92(18):8408-8412 (1995); Kono et al., Proteins: Structure, Function andGenetics 19:244-255 (1994); Hellinga and Richards, Proc. Natl. Acad.Sci. U.S.A. 91:5803-5807 (1994)]; and residue pair potentials [Jones,Protein Science 3: 567-574, (1994)]; PROSA [Heindlich et al., J. Mol.Biol. 216:167-180 (1990)]; THREADER [Jones et al., Nature 358:86-89(1992)], and other inverse folding methods such as those described bySimons et al. [Proteins, 34:535-543, (1999)], Levitt and Gerstein [Proc.Natl. Acad. Sci. U.S.A., 95:5913-5920, (1998)], Godzik and Skolnick[Proc. Natl. Acad. Sci. U.S.A., 89:12098-102, (1992)], Godzik et al. [J.Mol. Biol. 227:227-38, (1992)] and two profile methods [Gribskov et al.Proc. Natl. Acad. Sci. U.S.A. 84:4355-4358 (1987) and Fischer andEisenberg, Protein Sci. 5:947-955 (1996), Rice and Eisenberg J. Mol.Biol. 267:1026-1038(1997)], all of which are expressly incorporated byreference.

[0103] In addition, other computational methods such as those describedby Koehl and Levitt (J. Mol. Biol. 293:1161-1181 (1999); J. Mol. Biol.293:1183-1193 (1999); expressly incorporated by reference) can be usedto create a variant library that can optionally then be used to generatea smaller secondary library for use in experimental screening forimproved properties and function. In addition, there are computationalmethods based on force-field calculations such as SCMF that can be usedas well for SCMF, see Delarue et al. Pac. Symp. Biocomput. 109-21(1997); Koehl et al., J. Mol. Biol. 239:249-75 (1994); Koehl et al.,Nat. Struct. Biol. 2:163-70 (1995); Koehl et al., Curr. Opin. Struct.Biol. 6:222-6 (1996); Koehl et al., J. Mol. Biol. 293:1183-93 (1999);Koehl et al., J. Mol. Biol. 293:1161-81 (1999); Lee J., Mol. Biol.236:918-39 (1994); and Vasquez Biopolymers 36:53-70 (1995); all of whichare expressly incorporated by reference. Other forcefield calculationsthat can be used to optimize the conformation of a sequence within acomputational method, or to generate de novo optimized sequences asoutlined herein include, but are not limited to, OPLS-AA [Jorgensen etal., J. Am. Chem. Soc. 118:11225-11236 (1996); Jorgensen, W. L.; BOSS,Version 4.1; Yale University: New Haven, Conn. (1999)]; OPLS [Jorgensenet al., J. Am. Chem. Soc. 110:1657ff (1988); Jorgensen et al., J Am.Chem. Soc. 112:4768ff (1990)]; UNRES (United Residue Forcefield; Liwo etal., Protein Science 2:1697-1714 (1993); Liwo et al., Protein Science2:1715-1731 (1993); Liwo et al., J. Comp. Chem. 18:849-873 (1997); Liwoet al., J. Comp. Chem. 18:874-884 (1997); Liwo et al., J. Comp. Chem.19:259-276 (1998); Forcefield for Protein Structure Prediction (Liwo etal., Proc. Natl. Acad. Sci. U.S.A. 96:5482-5485 (1999)]; ECEPP/3 [Liwoet al., J Protein Chem. 13(4):375-80 (1994)]; AMBER 1.1 force field(Weiner et al., J. Am. Chem. Soc. 106:765_(—)784); AMBER 3.0 force field[U. C. Singh et al., Proc. Natl. Acad. Sci. U.S.A.. 82:755-759 (1985)];CHARMM and CHARMM22 (Brooks et al., J. Comp. Chem. 4:187-217); cvff3.0[Dauber-Osguthorpe et al., Proteins: Structure, Function and Genetics,4:31-47 (1988)]; cff91 (Maple et al., J. Comp. Chem. 15:162-182); also,the DISCOVER (cvff and cff91) and AMBER force-fields are used in theINSIGHT molecular modeling package (Biosym/MSI, San Diego Calif.) andHARMM is used in the QUANTA molecular modeling package (Biosym/MSI, SanDiego Calif.), all of which are expressly incorporated by reference. Infact, as is outlined below, these force-field methods may be used togenerate the variant TR library directly; these methods can be used togenerate a probability table from which an additional library isdirectly generated.

[0104] In a preferred embodiment, Protein Design Automation™ (PDA™) isused to generate a variable protein sequence comprising a defined energystate for each amino acid position as is described in U.S. Pat. Nos.6,188,965 and 6,296,312, all of which are expressly incorporated hereinby reference. Briefly, PDA™ can be described as follows. A known proteinstructure is used as the starting point. The residues to be optimizedare then identified, which may be the entire sequence or subset(s)thereof. The side chains of any positions to be varied are then removed.The resulting structure consisting of the protein backbone and theremaining sidechains is called the template. Each variable residueposition is then preferably classified as a core residue, a surfaceresidue, or a boundary residue; each classification defines a subset ofpossible amino acid residues for the position (for example, coreresidues generally will be selected from the set of hydrophobicresidues, surface residues generally will be selected from thehydrophilic residues, and boundary residues may be either). Each aminoacid can be represented by a discrete set of all allowed conformers ofeach side chain, called rotamers. Thus, to arrive at an optimal sequencefor a backbone, all possible sequences of rotamers must be screened,where each backbone position can be occupied either by each amino acidin all its possible rotameric states, or a subset of amino acids, andthus a subset of rotamers.

[0105] Two sets of interactions are then calculated for each rotamer atevery position: the interaction of the rotamer side chain with all orpart of the backbone (the “singles” energy, also called therotamer/template or rotamer/backbone energy), and the interaction of therotamer side chain with all other possible rotamers at every otherposition or a subset of the other positions (the “doubles” energy, alsocalled the rotamer/rotamer energy). The energy of each of theseinteractions is calculated through the use of a variety of scoringfunctions, which include the energy of van der Waal's forces, the energyof hydrogen bonding, the energy of secondary structure propensity, theenergy of surface area solvation and the electrostatics. Thus, the totalenergy of each rotamer interaction, both with the backbone and otherrotamers, is calculated, and stored in a matrix form.

[0106] The discrete nature of rotamer sets allows a simple calculationof the number of rotamer sequences to be tested. A backbone of length nwith m possible rotamers per position will have m^(n) possible rotamersequences, a number which grows exponentially with sequence length andrenders the calculations either unwieldy or impossible in real time.Accordingly, to solve this combinatorial search problem, a “Dead EndElimination” (DEE) calculation is performed. The DEE calculation isbased on the fact that if the worst total interaction of a first rotameris still better than the best total interaction of a second rotamer,then the second rotamer cannot be part of the global optimum solution.Since the energies of all rotamers have already been calculated, the DEEapproach only requires sums over the sequence length to test andeliminate rotamers, which speeds up the calculations considerably. DEEcan be rerun comparing pairs of rotamers, or combinations of rotamers,which will eventually result in the determination of a single sequencewhich represents the global optimum energy.

[0107] Once the global solution has been found, a Monte Carlo search maybe done to generate a rank-ordered list of sequences in the neighborhoodof the DEE solution. Starting at the DEE solution, random positions arechanged to other rotamers, and the new sequence energy is calculated. Ifthe new sequence meets the criteria for acceptance, it is used as astarting point for another jump. After a predetermined number of jumps,a rank-ordered list of sequences is generated. Monte Carlo searching isa sampling technique to explore sequence space around the global minimumor to find new local minima distant in sequence space. As is moreadditionally outlined below, there are other sampling techniques thatcan be used, including Boltzman sampling, genetic algorithm techniquesand simulated annealing. In addition, for all the sampling techniques,the kinds of jumps allowed can be altered (e.g. random jumps to randomresidues, biased jumps (to or away from wild-type, for example), jumpsto biased residues (to or away from similar residues, for example),etc.). Similarly, for all the sampling techniques, the acceptancecriteria of whether a sampling jump is accepted can be altered.

[0108] As outlined in U.S. Ser. No. 09/127,926, the protein backbone(comprising (for a naturally occurring protein) the nitrogen, thecarbonyl carbon, the α-carbon, and the carbonyl oxygen, along with thedirection of the vector from the α-carbon to the β-carbon) may bealtered prior to the computational analysis, by varying a set ofparameters called supersecondary structure parameters.

[0109] Once a protein structure backbone is generated (with alterations,as outlined above) and input into the computer, explicit hydrogens areadded if not included within the structure (for example, if thestructure was generated by X-ray crystallography, hydrogens must beadded). After hydrogen addition, energy minimization of the structure isrun, to relax the hydrogens as well as the other atoms, bond angles andbond lengths. In a preferred embodiment, this is done by doing a numberof steps of conjugate gradient minimization (Mayo et al., J. Phys. Chem.94:8897 (1990)) of atomic coordinate positions to minimize the Dreidingforce field with no electrostatics. Generally from about 10 to about 250steps is preferred, with about 50 being most preferred.

[0110] The protein backbone structure contains at least one variableresidue position. As is known in the art, the residues, or amino acids,of proteins are generally sequentially numbered starting with theN-terminus of the protein. Thus a protein having a methionine at it'sN-terminus is said to have a methionine at residue or amino acidposition 1, with the next residues as 2, 3, 4, etc. At each position,the wild type (i.e. naturally occurring) protein may have one of atleast 20 amino acids, in any number of rotamers. By “variable residueposition” herein is meant an amino acid position of the protein to bedesigned that is not fixed in the design method as a specific residue orrotamer, generally the wild-type residue or rotamer.

[0111] In a preferred embodiment, all of the residue positions of theprotein are variable. That is, every amino acid side chain may bealtered in the methods of the present invention. This is particularlydesirable for smaller proteins, although the present methods allow thedesign of larger proteins as well. While there is no theoretical limitto the length of the protein that may be designed this way, there is apractical computational limit.

[0112] In an alternate preferred embodiment, only some of the residuepositions of the protein are variable, and the remainder are “fixed”,that is, they are identified in the three dimensional structure as beingin a set conformation. In some embodiments, a fixed position is left inits original conformation (which may or may not correlate to a specificrotamer of the rotamer library being used). Alternatively, residues maybe fixed as a non-wild type residue; for example, when knownsite-directed mutagenesis techniques have shown that a particularresidue is desirable (for example, to eliminate a proteolytic site oralter the substrate specificity of an enzyme), the residue may be fixedas a particular amino acid. Alternatively, the methods of the presentinvention may be used to evaluate mutations de novo, as is discussedbelow. In an alternate preferred embodiment, a fixed position may be“floated”; the amino acid at that position is fixed, but differentrotamers of that amino acid are tested. In this embodiment, the variableresidues may be at least one, or anywhere from 0.1% to 99.9% of thetotal number of residues. Thus, for example, it may be possible tochange only a few (or one) residues, or most of the residues, with allpossibilities in between.

[0113] In a preferred embodiment, residues which can be fixed include,but are not limited to, structurally or biologically functionalresidues; alternatively, biologically functional residues mayspecifically not be fixed. For example, residues which are known to beimportant for biological activity, such as the residues which form theactive site of an enzyme, the substrate binding site of an enzyme, thebinding site for a binding partner (ligand/receptor, antigen/antibody,etc.), phosphorylation or glycosylation sites which are crucial tobiological function, or structurally important residues, such asdisulfide bridges, metal binding sites, critical hydrogen bondingresidues, residues critical for backbone conformation such as proline orglycine, residues critical for packing interactions, etc. may all befixed in a conformation or as a single rotamer, or “floated”.

[0114] Similarly, residues which may be chosen as variable residues maybe those that confer undesirable biological attributes, such assusceptibility to proteolytic degradation, dimerization or aggregationsites, glycosylation sites which may lead to immune responses, unwantedbinding activity, unwanted allostery, undesirable enzyme activity butwith a preservation of binding, etc.

[0115] In a preferred embodiment, each variable position is classifiedas either a core, surface or boundary residue position, although in somecases, as explained below, the variable position may be set to glycineto minimize backbone strain. In addition, as outlined herein, residuesneed not be classified, they can be chosen as variable and any set ofamino acids may be used. Any combination of core, surface and boundarypositions can be utilized: core, surface and boundary residues; core andsurface residues; core and boundary residues, and surface and boundaryresidues, as well as core residues alone, surface residues alone, orboundary residues alone.

[0116] Classification of residue positions as core, surface or boundarymay be done in several ways, as will be appreciated by those of skill inthe art. In a preferred embodiment, the classification is done via avisual scan of the original protein backbone structure, including theside chains, and assigning a classification based on a subjectiveevaluation of one skilled in the art of protein modeling. Alternatively,a preferred embodiment utilizes an assessment of the orientation of theCα-Cβvectors relative to a solvent accessible surface computed usingonly the template Cα atoms, as outlined in U.S. Ser. Nos. 60/061,097,60/043,464, 60/054,678, 09/127,926 and PCT US98/07254. Alternatively, asurface area calculation can be done.

[0117] Once each variable position is optionally classified as eithercore, surface or boundary, a set of amino acid side chains, and thus aset of rotamers, is assigned to each position. That is, the set ofpossible amino acid side chains that the program will allow to beconsidered at any particular position is chosen. Subsequently, once thepossible amino acid side chains are chosen, the set of rotamers thatwill be evaluated at a particular position can be determined. Thus, acore residue will generally be selected from the group of hydrophobicresidues consisting of alanine, valine, isoleucine, leucine,phenylalanine, tyrosine, tryptophan, and methionine (in someembodiments, when the αscaling factor of the van der Waals scoringfunction, described below, is low, methionine is removed from the set),and the rotamer set for each core position potentially includes rotamersfor these eight amino acid side chains (all the rotamers if a backboneindependent library is used, and subsets if a rotamer dependent backboneis used). Similarly, surface positions are generally selected from thegroup of hydrophilic residues consisting of alanine, serine, threonine,aspartic acid, asparagine, glutamine, glutamic acid, arginine, lysineand histidine. The rotamer set for each surface position thus includesrotamers for these ten residues. Finally, boundary positions aregenerally chosen from alanine, serine, threonine, aspartic acid,asparagine, glutamine, glutamic acid, arginine, lysine histidine,valine, isoleucine, leucine, phenylalanine, tyrosine, tryptophan, andmethionine. The rotamer set for each boundary position thus potentiallyincludes every rotamer for these seventeen residues (assuming cysteine,glycine and proline are not used, although they can be). Additionally,in some preferred embodiments, a set of 18 naturally occurring aminoacids (all except cysteine and proline, which are known to beparticularly disruptive) are used.

[0118] Thus, as will be appreciated by those in the art, there is acomputational benefit to classifying the residue positions, as itdecreases the number of calculations. It should also be noted that theremay be situations where the sets of core, boundary and surface residuesare altered from those described above; for example, under somecircumstances, one or more amino acids is either added or subtractedfrom the set of allowed amino acids. For example, some proteins thatdimerize or multimerize, or have ligand binding sites, may containhydrophobic surface residues, etc. In addition, residues that do notallow helix “capping” or the favorable interaction with an—helix dipolemay be subtracted from a set of allowed residues. This modification ofamino acid groups is done on a residue by residue basis.

[0119] In a preferred embodiment, proline, cysteine and glycine are notincluded in the list of possible amino acid side chains, and thus therotamers for these side chains are not used. However, in a preferredembodiment, when the variable residue position has a Φ angle (that is,the dihedral angle defined by 1) the carbonyl carbon of the precedingamino acid; 2) the nitrogen atom of the current residue; 3) the α-carbonof the current residue; and 4) the carbonyl carbon of the currentresidue) greater than 0°, the position is set to glycine to minimizebackbone strain.

[0120] Once the group of potential rotamers is assigned for eachvariable residue position, processing proceeds as outlined in U.S. Ser.No. 09/127,926 and PCT US98/07254. This processing step entailsanalyzing interactions of the rotamers with each other and with theprotein backbone to generate optimized protein sequences.Simplistically, the processing initially comprises the use of a numberof scoring functions to calculate energies of interactions of therotamers, either to the backbone itself or other rotamers. Preferred PDAscoring functions include, but are not limited to, a Van der Waalspotential scoring function, a hydrogen bond potential scoring function,an atomic solvation scoring function, a secondary structure propensityscoring function and an electrostatic scoring function. As is furtherdescribed below, at least one scoring function is used to score eachposition, although the scoring functions may differ depending on theposition classification or other considerations, like favorableinteraction with an α-helix dipole. As outlined below, the total energywhich is used in the calculations is the sum of the energy of eachscoring function used at a particular position, as is generally shown inEquation 1:

E _(total) =nE _(vdw) +nE _(as) +nE _(h-bonding) +nE _(ss) +nE_(elec)  Equation 1

[0121] In Equation 1, the total energy is the sum of the energy of thevan der Waals potential (E_(vdw)), the energy of atomic solvation(E_(as)), the energy of hydrogen bonding (E_(h-bonding)), the energy ofsecondary structure (E_(ss)) and the energy of electrostatic interaction(E_(elec)). The term n is either 0 or 1, depending on whether the termis to be considered for the particular residue position.

[0122] As outlined in U.S. Ser. Nos. 60/061,097, 60/043,464, 60/054,678,09/127,926 and PCT US98/07254, any combination of these scoringfunctions, either alone or in combination, may be used. Once the scoringfunctions to be used are identified for each variable position, thepreferred first step in the computational analysis comprises thedetermination of the interaction of each possible rotamer with all orpart of the remainder of the protein. That is, the energy ofinteraction, as measured by one or more of the scoring functions, ofeach possible rotamer at each variable residue position with either thebackbone or other rotamers, is calculated. In a preferred embodiment,the interaction of each rotamer with the entire remainder of theprotein, i.e. both the entire template and all other rotamers, is done.However, as outlined above, it is possible to only model a portion of aprotein, for example a domain of a larger protein, and thus in somecases, not all of the protein need be considered. The term “portion”, asused herein, with regard to a protein refers to a fragment of thatprotein. This fragment may range in size from 10 amino acid residues tothe entire amino acid sequence minus one amino acid. Accordingly, theterm “portion”, as used herein, with regard to a nucleic refers to afragment of that nucleic acid. This fragment may range in size from 10nucleotides to the entire nucleic acid sequence minus one nucleotide.

[0123] In a preferred embodiment, the first step of the computationalprocessing is done by calculating two sets of interactions for eachrotamer at every position: the interaction of the rotamer side chainwith the template or backbone (the “singles” energy), and theinteraction of the rotamer side chain with all other possible rotamersat every other position (the “doubles” energy), whether that position isvaried or floated. It should be understood that the backbone in thiscase includes both the atoms of the protein structure backbone, as wellas the atoms of any fixed residues, wherein the fixed residues aredefined as a particular conformation of an amino acid.

[0124] Thus, “singles” (rotamer/template) energies are calculated forthe interaction of every possible rotamer at every variable residueposition with the backbone, using some or all of the scoring functions.Thus, for the hydrogen bonding scoring function, every hydrogen bondingatom of the rotamer and every hydrogen bonding atom of the backbone isevaluated, and the E_(HB) is calculated for each possible rotamer atevery variable position. Similarly, for the van der Waals scoringfunction, every atom of the rotamer is compared to every atom of thetemplate (generally excluding the backbone atoms of its own residue),and the E_(vdW) is calculated for each possible rotamer at everyvariable residue position. In addition, generally no van der Waalsenergy is calculated if the atoms are connected by three bonds or less.For the atomic solvation scoring function, the surface of the rotamer ismeasured against the surface of the template, and the E_(as) for eachpossible rotamer at every variable residue position is calculated. Thesecondary structure propensity scoring function is also considered as asingles energy, and thus the total singles energy may contain an E_(ss)term. As will be appreciated by those in the art, many of these energyterms will be close to zero, depending on the physical distance betweenthe rotamer and the template position; that is, the farther apart thetwo moieties, the lower the energy.

[0125] For the calculation of “doubles” energy (rotamer/rotamer), theinteraction energy of each possible rotamer is compared with everypossible rotamer at all other variable residue positions. Thus,“doubles” energies are calculated for the interaction of every possiblerotamer at every variable residue position with every possible rotamerat every other variable residue position, using some or all of thescoring functions. Thus, for the hydrogen bonding scoring function,every hydrogen bonding atom of the first rotamer and every hydrogenbonding atom of every possible second rotamer is evaluated, and theE_(HB) is calculated for each possible rotamer pair for any two variablepositions. Similarly, for the van der Waals scoring function, every atomof the first rotamer is compared to every atom of every possible secondrotamer, and the E_(vdW) is calculated for each possible rotamer pair atevery two variable residue positions. For the atomic solvation scoringfunction, the surface of the first rotamer is measured against thesurface of every possible second rotamer, and the E_(as) for eachpossible rotamer pair at every two variable residue positions iscalculated. The secondary structure propensity scoring function need notbe run as a “doubles” energy, as it is considered as a component of the“singles” energy. As will be appreciated by those in the art, many ofthese double energy terms will be close to zero, depending on thephysical distance between the first rotamer and the second rotamer; thatis, the farther apart the two moieties, the lower the energy.

[0126] In a preferred embodiment, a sequence prediction algorithm (SPA)is used to generate a variable protein sequence comprising a definedenergy state for each amino acid position as is described in Raha, K.,et al. (2000) Protein Sci., 9:1106-1119, U.S. Ser. No. 09/877,695, filedJun. 8, 2001, entitled “Apparatus and Method for Designing Proteins andProtein Libraries”; both of which are expressly incorporated herein byreference.

[0127] In a preferred embodiment, force field calculations such as SCMFcan be used generate a variable protein sequence comprising a definedenergy state for each amino acid position. For SCMF, see Delarue etal.,. Pac. Symp. Biocomput. 109-21 (1997), Koehl et al., J. Mol. Biol.239:249 (1994); Koehl et al., Nat. Struc. Biol. 2:163 (1995); Koehl etal., Curr. Opin. Struct. Biol. 6:222 (1996); Koehl et al., J. Mol. Bio.293:1183 (1999); Koehl et al., J. Mol. Biol. 293:1161 (1999); Lee J.Mol. Biol. 236:918 (1994); and Vasquez Biopolymers 36:53-70 (1995); allof which are expressly incorporated by reference. Other force fieldcalculations that can be used to optimize the conformation of a sequencewithin a computational method, or to generate de novo optimizedsequences as outlined herein include, but are not limited to, OPLS_AA(Jorgensen, et al., J. Am. Chem. Soc. (1996), v 118, pp 11225_(—)11236;Jorgensen, W. L.; BOSS, Version 4.1; Yale University: New Haven, Conn.(1999)); OPLS (Jorgensen, et al., J. Am. Chem. Soc. (1988), v 110, pp1657ff; Jorgensen, et al., J Am. Chem. Soc. (1990), v 112, pp 4768ff);UNRES (United Residue Forcefield; Liwo, et al., Protein Science (1993),v 2, pp1697_(—)1714; Liwo, et al., Protein Science (1993), v 2,pp1715_(—)1731; Liwo, et al., J. Comp. Chem. (1997), v 18, pp849_(—)873;Liwo, et al., J. Comp. Chem. (1997), v 18, pp874_(—)884; Liwo, et al.,J. Comp. Chem. (1998), v 19, pp259_(—)276; Forcefield for ProteinStructure Prediction (Liwo, et al., Proc. Natl. Acad. Sci. USA (1999), v96, pp5482_(—)5485); ECEPP/3 (Liwo et al., J Protein Chem 1994 May13(4):375_(—)80); AMBER 1.1 force field (Weiner, et al., J. Am. Chem.Soc. v106, pp765_(—)784); AMBER 3.0 force field (U. C. Singh et al.,Proc. Natl. Acad. Sci. USA. 82:755_(—)759); CHARMM and CHARMM22 (Brooks,et al., J. Comp. Chem. v4, pp 187_(—)217); cvff3.0 (Dauber_Osguthorpe,et al., (1988) Proteins: Structure, Function and Genetics, v4,pp31_(—)47); cff91 (Maple, et al., J. Comp. Ch em. v15, 162_(—)182);also, the DISCOVER (cvff and cff91) and AMBER forcefields are used inthe INSIGHT molecular modeling package (Biosym/MSI, San Diego Calif.)and HARMM is used in the QUANTA molecular modeling package (Biosym/MSI,San Diego Calif.), all of which are expressly incorporated by reference.In fact, as is outlined below, these force field methods may be used togenerate the secondary library directly; that is, no primary library isgenerated; rather, these methods can be used to generate a probabilitytable from which the secondary library is directly generated, forexample by using these force fields during an SCMF calculation.

[0128] Once the singles and doubles energies are calculated and stored,the next step of the computational processing may occur. As outlined inU.S. Ser. No. 09/127,926 and PCT US98/07254, preferred embodimentsutilize a Dead End Elimination (DEE) step, and preferably a Monte Carlostep. PDA™, viewed broadly, has three components that may be varied toalter the output (e.g. the primary library): the scoring functions usedin the process; the filtering technique, and the sampling technique.

[0129] In a preferred embodiment, the scoring functions may be altered.In a preferred embodiment, the scoring functions outlined above may bebiased or weighted in a variety of ways. For example, a bias towards oraway from a reference sequence or family of sequences can be done; forexample, a bias towards wild-type or homolog residues may be used.Similarly, the entire protein or a fragment of it may be biased; forexample, the active site may be biased towards wild-type residues, ordomain residues towards a particular desired physical property can bedone. Furthermore, a bias towards or against increased energy can begenerated. Additional scoring function biases include, but are notlimited to applying electrostatic potential gradients or hydrophobicitygradients, adding a substrate or binding partner to the calculation, orbiasing towards a desired charge or hydrophobicity.

[0130] In addition, in an alternative embodiment, there are a variety ofadditional scoring functions that may be used. Additional scoringfunctions include, but are not limited to torsional potentials, orresidue pair potentials, or residue entropy potentials. Such additionalscoring functions can be used alone, or as functions for processing thelibrary after it is scored initially. For example, a variety offunctions derived from data on binding of peptides to MHC (MajorHistocompatibility Complex) can be used to rescore a library in order toeliminate proteins containing sequences which can potentially bind toMHC, i.e. potentially immunogenic sequences.

[0131] In a preferred embodiment, a variety of filtering techniques canbe done, including, but not limited to, DEE and its relatedcounterparts. Additional filtering techniques include, but are notlimited to branch-and-bound techniques for finding optimal sequences(Gordon and Majo, Structure Fold. Des. 7:1089-98, 1999), and exhaustiveenumeration of sequences. It should be noted however, that sometechniques may also be done without any filtering techniques; forexample, sampling techniques can be used to find good sequences, in theabsence of filtering.

[0132] As will be appreciated by those in the art, once an optimizedsequence or set of sequences is generated, a variety of sequence spacesampling methods can be done, either in addition to the preferred MonteCarlo methods, or instead of a Monte Carlo search. That is, once asequence or set of sequences is generated, preferred methods utilizesampling techniques to allow the generation of additional, relatedsequences for testing.

[0133] These sampling methods can include the use of amino acidsubstitutions, insertions or deletions, or recombinations of one or moresequences. As outlined herein, a preferred embodiment utilizes a MonteCarlo search, which is a series of biased, systematic, or random jumps.However, there are other sampling techniques that can be used, includingBoltzman sampling, genetic algorithm techniques and simulated annealing.In addition, for all the sampling techniques, the kinds of jumps allowedcan be altered (e.g. random jumps to random residues, biased jumps (toor away from wild-type, for example), jumps to biased residues (to oraway from similar residues, for example, etc.). Jumps where multipleresidue positions are coupled (two residues always change together, ornever change together), jumps where whole sets of residues change toother sequences (e.g., recombination). Similarly, for all the samplingtechniques, the acceptance criteria of whether a sampling jump isaccepted can be altered, to allow broad searches at high temperature andnarrow searches close to local optima at low temperatures. SeeMetropolis et al., J. Chem Phys v21, pp 1087, 1953, hereby expresslyincorporated by reference.

[0134] In addition, it should be noted that the preferred methods of theinvention result in a rank ordered list of sequences; that is, thesequences are ranked on the basis of some objective criteria. However,as outlined herein, it is possible to create a set of non-orderedsequences, for example by generating a probability table directly (forexample using SCMF analysis or sequence alignment techniques) that listssequences without ranking them. The sampling techniques outlined hereincan be used in either situation.

[0135] In a preferred embodiment, Boltzman sampling is done. As will beappreciated by those in the art, the temperature criteria for Boltzmansampling can be altered to allow broad searches at high temperature andnarrow searches close to local optima at low temperatures (see e.g.,Metropolis et al., J. Chem. Phys. 21:1087, 1953).

[0136] In a preferred embodiment, the sampling technique utilizesgenetic algorithms, e.g., such as those described by Holland (Adaptationin Natural and Artificial Systems, 1975, Ann Arbor, U. Michigan Press).Genetic algorithm analysis generally takes generated sequences andrecombines them computationally, similar to a nucleic acid recombinationevent, in a manner similar to “gene shuffling”. Thus the “jumps” ofgenetic algorithm analysis generally are multiple position jumps. Inaddition, as outlined below, correlated multiple jumps may also be done.Such jumps can occur with different crossover positions and more thanone recombination at a time, and can involve recombination of two ormore sequences. Furthermore, deletions or insertions (random or biased)can be done. In addition, as outlined below, genetic algorithm analysismay also be used after the secondary library has been generated.

[0137] In a preferred embodiment, the sampling technique utilizessimulated annealing, e.g., such as described by Kirkpatrick et al.[Science, 220:671-680 (1983)]. Simulated annealing alters the cutoff foraccepting good or bad jumps by altering the temperature. That is, thestringency of the cutoff is altered by altering the temperature. Thisallows broad searches at high temperature to new areas of sequencespace, altering with narrow searches at low temperature to exploreregions in detail.

[0138] In addition, there are computational methods that may be used asdescribed in U.S. Ser. Nos. 09/927,790, 60/352,103, and 60/351,937, allof which are expressly incorporated herein by reference.

[0139] Any protein design cycle can be used individually, in combinationwith other methods, or in reiterations that combine methods.

[0140] In a preferred embodiment, the methods of the invention involvestarting with a target protein and use experimental methods to generatea variant protein. That is, nucleic acid recombination techniques as areknown to one of skill in the art are used to experimentally generate thevariant proteins of the present invention.

[0141] Thus, use of a nucleic acid recombination method orimplementation of a protein design cycle, or a combination of nucleicacid recombination methods and computational processing results in thegeneration of a variant protein exhibiting altered cofactor specificity.By “variant protein” or “variable protein sequence” herein is meant aprotein that differs from the scaffold protein or target protein in atleast one amino acid residue.

[0142] In a preferred embodiment, the cofactor specificity of thevariant protein is altered compare to the target protein. Targetproteins include but are not limited to thioredoxin reductase,glutathione reductase, and 2,5-diketo-D-gluconic acid reductase. Twospecific amino acid regions have previously been reported for cofactorspecificity (Carugo and Argos, Proteins (1997) 28, 10-28). The firstregion immediately follows the Gly-rich loop with the motifG-x-G-x-X₁-X₂, and is involved in pyridine nucleotide binding.Originally, it was believed that in proteins specific for NADPH, X₁ andX₂ are polar residues (Ser/Thr) and Ala, respectively, whereas forproteins specific for NADH, X₁ and X₂ are hydrophobic residues (Val/Ile)and Gly, respectively. The determination of additional sequences,however, demonstrated significant sequence variability for X₁ and X₂,breaking this original rule for cofactor specificity.

[0143] The second region is reported as generally corresponding to theregion from about amino acid 175 to amino acid 181 in E.coli thioredoxinreductase. In the NADH-dependent bacterial flavoprotein reductases Cp34and AhpF (Reynolds et al., Biochemistry (2002) 41, 1990-2001), thesecond motif is reported as H-Q-F-x-x-x-Q and E-F-A-x-x-x-K,respectively. In a mutation study (Scrutton et al., Nature (1990) 343,38-43; Mittl et al., Protein Sci. (1994) 3, 1504-1514), the NADPHspecificity of E.coli GR was switched to NADH by mutation of the secondmotif to E-M-F-x-x-x-x-P (see picture below).

[0144] In a preferred embodiment, a variant thioredoxin reductase ismade in which the cofactor specificity is altered. Thioredoxin (TR) is apotent protein disulfide reductase found in most organisms thatparticipates in many thiol-dependent cellular reductive processes. Inaddition to its ability to effect the reduction of cellular proteins, itis recognized that thioredoxin reductase can act directly as anantioxidant (e.g., by preventing oxidation of an oxidizable substrate byscavenging reactive oxygen species) or can increase the oxidative stressin a cell by autooxidizing (e.g., generating superoxide radicals throughautooxidation).

[0145] Thioredoxins are low molecular weight dithiol proteins that havethe ability to reduce disulfides in typical organic compounds such asEllman's reagent or disulfides as they exist naturally in a variety ofproteins (Holmgren, A. (1981) Trends in Biochemical Science, 6, 26-39).Under normal physiological conditions, following the reduction of adisulfide bond, the oxidized thioredoxin is reduced by thioredoxinreductase, with the aid of NADPH as a cofactor. Thioredoxin of a speciesis typically reduced only by thioredoxin reductase of the same species.

[0146] The active site pocket of the thioredoxin reductases exhibits ahighly conserved region across species, as shown in the amino acidalignment of FIG. 1A. This region corresponds to the amino acid regionbetween residues 156 and 181 of the E. coli thioredoxin reductase, orresidues 220 and 245 of the Arabidopsis thioredoxin reductase. Thishighly conserved pocket is mostly responsible for the binding of theco-factor, NADPH. The trans-species variations in the amino acidsequence of thioredoxin reductase appear in the C- and N-terminiregions, i.e., the region between residues 1-155 and 182-C-terminus ofthe E. coli thioredoxin reductase, or residues 1-219 and 246-C-terminusof the Arabidopsis thioredoxin reductase.

[0147] The target proteins used to generate the variant thioredoxinreductases of the present invention may be obtained from any organismincluding, but not limited to, E. coli, Bacillus subtillis,Mycobacterium leprae, Sarccharomyces, Neurospora crassa, Arabidopsis,Homo sapiens, Methanosarcina acetivorans str. C2A, Ureaplasma parvum,Mycoplasma pulmonis, Rickettsia conorii, Spironucleus barkhanus,Listeria innocua, Fusobacterium nucleatum, Methanococcus jannaschii,Mycoplasma genitalium, Haemophilus influenzae, Vibrio cholera, Listeriamonocytogenes, Helicobacter pylori, Methanopyrus kandleri AV19,Schizosaccharomyces pombe, Chlamydophila pneumoniae, Streptococcuspyogenes, Plasmodium falciparum, Mycobacterium tuberculosis, Mycoplasmagenitalium, Borrelia burgdorferi, Ralstonia solanacearum, Sinorhizobiummeliloti, Caulobacter crescentus CB15], Encephalitozoon cuniculi,Staphylococcus aureus, Clostridium perfringens, Halobacterium sp. NRC-1,Sulfolobus solfataricus, Rickettsia prowazekii, Mesorhizobium loti, Musmusculus, Thermoplasma acidophilum, Sulfolobus tokodaii, Chlamydophilapneumoniae, Mycoplasma pulmonis, Campylobacter jejuni, Chlamydiatrachomatis, Aeropyrum pemix, Neisseria meningitides, Pyrococcushorikoshii, Pyrococcus abyssi, Thermoplasma volcanium, Pyrococcusfuriosus, Archaeoglobus fulgidus, Yersinia pestis, Bacillus halodurans,Ureaplasma urealyticum, Methanothermobacter thermautotrophicus,Pyrobaculum aerophilum, Chlamydia muridarum, Treponema pallidum,Streptomyces coelicolor, Brucella melitensis, Agrobacterium tumefaciens,Drosophila melanogaster, Streptococcus pneumoniae, Clostridiumacetobutylicum, Xylella fastidiosa, Lactococcus lactis, Thermotogamaritime, Pseudomonas aeruginosa, Salmonella enterica, Nostoc sp,Deinococcus radiodurans, Penicillium chrysogenum, Salmonellatyphimurium, Lactobacillus elbrueckii, Clostridium sticklandii,Clostridium litorale, Clostridium acetobutylicum, Thermoplasmavolcanium, Rattus norvegicus, Coccidioides immitis, Bos Taurus,Mycobacterium smegmatis, Synechocystis sp, Plasmodium falciparum,Carboxydothermus hydrogenoformans, Sus scrofa Triticum aestivum.

[0148] In a preferred embodiment, the target proteins used to generatethe variant thioredoxin reductases are selected from E. coli, Bacillussubtillis, Mycobacterium leprae, Saccharomyces, Neurospora crassa,Arabidopsis, Homo sapiens, barley TR found in U.S. Pat. No. 6,380,372,entitled Barley gene for Thioredoxin and NADP-thioredoxin reductase,issued 20020430; rice TR found in WO0198509 as amino acid sequence ofSEQ ID NO:27 therein and its nucleotide sequence as sequence of SEQ IDNO:25 therein, the heat stable TRs from Archaeoglobusfulgidus(gil2649006) (trxB) which is the protein sequence SEQ ID NO:7 inWO0198509, and the protein sequence of TR from Methanococcus jannaschii(gil 1592167) (trxB), which is SEQ ID NO:6 in WO0198509.

[0149] In a preferred embodiment, the catalytic efficiency of thevariant TR proteins is improved for the cofactor NADPH. Preferably, thecatalytic efficiency of variant TRs is improved by at least about 5% ascompared to wild-type for NADPH. More preferably, the catalyticefficiency of variant TRs is improved by at least about 15% as comparedto wild-type for NADPH. More preferably the catalytic efficiency ofvariant TRs is improved by at least about 25% as compared to wild-typefor NADPH. More preferably, the catalytic efficiency of variant TRs isimproved by at least about 50% as compared to wild-type for NADPH. Morepreferably, the catalytic efficiency of variant TRs is improved by atleast about 100% as compared to wild-type for NADPH. More preferably,the catalytic efficiency of variant TRs is improved by at least about300% as compared to wild-type for NADPH. More preferably, the catalyticefficiency of variant TRs is improved by at least 500% as compared towild-type for NADPH.

[0150] In a preferred embodiment, the catalytic efficiency of thevariant TR proteins is improved for the cofactor NADH. Preferably, thecatalytic efficiency of variant TRs is improved by at least about 5% ascp,[ared to wild-type for NADH. More preferably the catalytic efficiencyof variant Trs is improved by at least about 15% as compared towild-type for NADH. More preferably, the catalytic efficiency of varoamtTRs is improved by at least about 25% as compared to wild-type for NADH.More preferably, the catalytic efficiency of variant TRs is improved byat least about 50% as compared to wild-type for NADH. More preferably,the catalytic efficiency of variant TRs is improved by at least about100% as compared to wild-type for NADH. More preferably, the catalyticefficiency of variant TRs is improvedby at least about 300% as comparedto wild-type for NADH. More preferably, the catalytic efficiency ofvariant TRs is improvedby at least about 500% as compared to wild-typefor NADH. More preferably, the catalytic efficiency of variant TRs isimprovedby at least about 1000% as compared to wild-type for NADH. Morepreferably, the catalytic efficiency of variant TRs is improvedby atleast about 1300% as compared to wild-type for NADH. More preferably,the catalytic efficiency of variant TRs is improvedby at least about3000% as compared to wild-type for NADH.

[0151] In a preferred embodiment, the cofactor specificity of thevariant thioredoxin reductase is altered such that there is an increasedactivity using NADH. Preferably, variant thioredoxin reductases (TR)will have at least 50% of native NADPH dependent activity using NADH.More preferably, variant thioredoxin reductases (TR) will have at least75% of native NADPH dependent activity using NADH. More preferably,variant thioredoxin reductases (TR) will have at least 85% of nativeNADPH dependent activity using NADH. More preferably, variantthioredoxin reductases (TR) will have at least 95% of native NADPHdependent activity using NADH. More preferably, variant thioredoxinreductases (TR) will have at least 100% of native NADPH dependentactivity using NADH.

[0152] In a preferred embodiment, the cofactor specificity of thevariant thioredoxin reductase is altered such that there is a cofactorswitch from NADPH to NADH. In other words, these variants will have anincrease in NADH-dependent activity and a substantially simultaneousdecrease in NADPH dependent activity. Preferably, variant thioredoxinreductase (TRs) will have at least 50% of native NADPH dependentactivity using NADH. More preferably, variant thioredoxin reductase willhave at least 75% of native NADPH dependent activity using NADH. Morepreferably, variant thioredoxin reductase will have at least 85% ofnative NADPH dependent activity using NADH. More preferably, variantthioredoxin reductase will have at least 95% of native NADPH dependentactivity using NADH. More preferably, variant thioredoxin reductase willhave at least 100% of native NADPH dependent activity using NADH.

[0153] Preferably, variant thioredoxin reductases (TRs) will have lessthan about 0.5% of native NADPH dependent activity. More preferably, TRswill have less than about 5% of native NADPH dependent activity. Morepreferably, TRs will have less than about 20% of native NADPH dependentactivity. More preferably, TRs will have less than about 25% of nativeNADPH dependent activity. More preferably, TRs will have less than about30% of native NADPH dependent activity. More preferably, TRs will haveless than about 50% of native NADPH dependent activity. More preferably,TRs will have less than about 75% of native NADPH dependent activity.More preferably, TRs will have less than about 95% of native NADPHdependent activity.

[0154] In another embodiment, the catalytic efficiency of the variant TRproteins is improved for both co-factors, NADH and NADPH, together.Preferably, the catalytic efficiency of the TR variants is improved byat least about 5% as compared to wild-type for either of the twoco-factors. More preferably, the catalytic efficiency of the TR variantsis improved by at least about 50% as compared to wild-type for either ofthe two co-factors.. More preferably, the catalytic efficiency of the TRvariants is improved by at least about 100% as compared to wild-type foreither of the two co-factors.. More preferably, the catalytic efficiencyof the TR variants is improved by at least about 300% as compared towild-type for either of the two co-factors.. More preferably, thecatalytic efficiency of the TR variants is improved by at least about1000% as compared to wild-type for either of the two co-factors. Morepreferably, the catalytic efficiency of the TR variants is improved byat least about 2000% as compared to wild-type for either of the twoco-factors.

[0155] In a preferred embodiment, the NADPH binding affinity of thevariant thioredoxin reductases (TRs) may be unaffected, reduced, orenhanced. For example, in some embodiments, variant TRs show greaterthan 100 times more affinity for NADPH than for NADH, while in otherembodiments, variant TRs show greater than 50 times more affinity forNADPH than for NADH, or variant TRs may show greater than 25 times moreaffinity for NADPH than for NADH.

[0156] In a preferred embodiment, the ability of the variant TR proteinto reduce its cognate thioredoxin is not substantially affected.

[0157] In a preferred embodiment, the substrate specificity of thevariant TR protein is altered such that the TR protein may act on athioredoxin protein from another species

[0158] In some embodiments, potential glycoslylation sites added byprotein design cycle may be removed without affecting activity by usinga second protein design cycle.

[0159] In a preferred embodiment, the variant TR proteins have from 1 to3 amino acid substitutions in amino acid regions involved in cofactorspecificity as compared to the wild-type TR proteins. In otherembodiments, the variant TR proteins have additional amino acidsubstitutions at other positions. Thus, variant TR proteins may have atleast about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39 ,40 different residues in other positions. As will beappreciated by those of skill in the art, the number of additionalpositions that may have amino acid substitutions will depend on thewild-type TR protein used to generate the variants. Thus, in someinstances, up to 50 different positions may have amino acidsubstitutions.

[0160] In a preferred embodiment, the variant TR protein comprise aminoacid substitutions are selected from positions A4, A5 and A6,corresponding to positions 190, 191, and 195 in the Arabidopsis NTRprotein (Genbank accession no. Q39242), positions 156, 157, and 175 inthe E. coli TR protein (Genbank accession no P09625), positions 155,156, and 174 in the Bacillus subtillis TR protein (Genbank accession noP80880), positions 163, 164, and 182 in the Mycobacterium leprae TRprotein (Genbank accession no P46843), residue 164, 165, and 183 in theSacchromyces TR protein (Genbank accession no P29509 and P38816),positions 163, 164, and 182 in the Neurospora crassa TR protein (Genbankaccession no P51978), residue 170, 171, 189 in the Arabidopsis TRprotein (Genbank accession no Q39243) and residue 217, 218 and 249 inthe Human TR protein (Genbank accession no Q16881).

[0161] In a preferred embodiment, the variant TR proteins comprise aminoacid substitutions selected from the group of substitutions consistingof RA4W, RA5L, R A5M, R A5I, R A5F, R A5V, R A5Y, RA5A, RA5S, RA5C,RA5T, RA6T, R A6S, R A6Q, R A6G, and R A6N, RA6D, RA6M, and RA6E

[0162] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W and RA6T.

[0163] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5L, and RA6S.

[0164] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA5Y and RA6N.

[0165] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5F, and RA6Q.

[0166] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5L, and RA6T.

[0167] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W and RA6S.

[0168] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA5Y and RA6N.

[0169] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA5F and RA6N.

[0170] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W and RA6T.

[0171] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5L and RA6S.

[0172] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5M, and RA6S.

[0173] In a preferred embodiment, the variant TR protein comprises theamino acid substitutionsRA4W, RA5I, and RA6S.

[0174] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5F, and RA6Q.

[0175] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, and RA5V.

[0176] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5M, and RA6G.

[0177] In a preferred embodiment, the variant TR protein comprises theamino acid substitutions RA4W, RA5V, and RA6G.

[0178] In a preferred embodiment, variant protein is a polypeptidemolecule of Formula I.

S₁-A₁-A₂-S₂-A₃-A₄-A₅-S₃-A₆-S₄  (I)

[0179] where

[0180] a) S₁ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:5, SEQ ID NO:6, and SEQ ID NO:7, or a sequence having substantialsimilarity thereto;

[0181] b) S₂ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQID NO:12, SEQ ID NO:13, and SEQ ID NO:14, or a sequence havingsubstantial similarity thereto;

[0182] c) S₃ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18,SEQ ID NO:19, SEQ ID NO:20, and SEQ ID NO:21, or a sequence havingsubstantial similarity thereto;

[0183] d) S₄ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25,SEQ ID NO:26, SEQ ID NO:27, and SEQ ID NO:28, or a sequence havingsubstantial similarity thereto;

[0184] e) A₁ is an amino acid moiety selected from the group consistingof serine, valine, glycine, alanine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0185] f) A₂ is an amino acid moiety selected from the group consistingof alanine, glycine, valine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0186] g) A₃ is an amino acid moiety selected from the group consistingof histidine, aspartic acid, glutamic acid, arginine, leucine, serine,threonine, cysteine, asparagine, glutamine, and tyrosine;

[0187] h) A₄ is an amino acid moiety selected from the group consistingof arginine, alanine, glycine, valine, leucine, isoleucine, methionine,phenylalanine, and tryptophan;

[0188] i) A₅ is an amino acid moiety selected from the group consistingof arginine, asparagine, glutamine, aspartic acid, glutamic acid,cysteine, serine, threonine, and lysine;

[0189] j) A₆ is an amino acid moiety selected from the group consistingof arginine, glutamic acid, asparagine, glutamine, aspartic acid,cysteine, serine, threonine, and lysine;

[0190] provided that at least

[0191] A₁ is not serine;

[0192] A₂ is not alanine;

[0193] A₃ is not histidine;

[0194] A₄ is not arginine;

[0195] A₅ is not arginine; or

[0196] A₆ is not arginine.

[0197] In Formula I, above, the sequence A₁-A₂-S₂-A₃-A₄-A₅-S₃-A₆corresponds to a highly conserved pocket in the sequence of thioredoxinreductase proteins obtained from various species. A₁ corresponds toresidue 156 in the E. coli thioredoxin reductase sequence, residue 155in the Bacillus subtillis thioredoxin reductase sequence, residue 163 inthe Mycobacterium leprae thioredoxin reductase sequence, residue 164 inthe Sarccharomyces thioredoxin reductase sequence, residue 163 in theNeurospora crassa thioredoxin reductase sequence, residue 170 in theArabidopsis thioredoxin reductase sequence, and residue 217 in the Humanthioredoxin reductase sequence. In the wild-type protein, this residueis threonine for E. coli and human, and serine for the other listedspecies.

[0198] A₂ corresponds to residue 157 in the E. coli thioredoxinreductase sequence, residue 156 in the Bacillus subtillis thioredoxinreductase sequence, residue 164 in the Mycobacterium leprae thioredoxinreductase sequence, residue 165 in the Sarccharomyces thioredoxinreductase sequence, residue 164 in the Neurospora crassa thioredoxinreductase sequence, residue 171 in the Arabidopsis thioredoxin reductasesequence, residue 218 in the Human thioredoxin reductase sequence. Inthe wild-type protein, this residue is valine for human and alanine forall the other listed species.

[0199] A₃ corresponds to residue 175 in the E. coli thioredoxinreductase sequence, residue 174 in the Bacillus subtillis thioredoxinreductase sequence, residue 182 in the Mycobacterium leprae thioredoxinreductase sequence, residue 183 in the Sarccharomyces thioredoxinreductase sequence, residue 182 in the Neurospora crassa thioredoxinreductase sequence, residue 189 in the Arabidopsis thioredoxin reductasesequence, residue 249 in the Human thioredoxin reductase sequence. Inthe wild-type protein, this residue is arginine for human, valine forSarccharomyces and Neurospora crassa, and histidine for all the otherlisted species.

[0200] A₄ corresponds to residue residue 176 in the E. coli thioredoxinreductase sequence, residue 175 in the Bacillus subtillis thioredoxinreductase sequence, residue 183 in the Mycobacterium leprae thioredoxinreductase sequence, residue 184 in the Sarccharomyces thioredoxinreductase sequence, residue 183 in the Neurospora crassa thioredoxinreductase sequence, residue 190 in the Arabidopsis thioredoxin reductasesequence, residue 250 in the Human thioredoxin reductase sequence. Inthe wild-type protein, this residue is glutamine for human and argininefor all the other listed species.

[0201] A₅ corresponds to residue 177 in the E. coli thioredoxinreductase sequence, residue 176 in the Bacillus subtillis thioredoxinreductase sequence, residue 184 in the Mycobacterium leprae thioredoxinreductase sequence, residue 185 in the Sarccharomyces thioredoxinreductase sequence, residue 184 in the Neurospora crassa thioredoxinreductase sequence, residue 191 in the Arabidopsis thioredoxin reductasesequence, residue 251 in the Human thioredoxin reductase sequence. Inthe wild-type protein, this residue is lysine for Sarccharomyces andNeurospora crassa, phenylalanine for human, and arginine for all theother listed species.

[0202] A₆ corresponds to residue 181 in the E. coli thioredoxinreductase sequence, residue 180 in the Bacillus subtillis thioredoxinreductase sequence, residue 188 in the Mycobacterium leprae thioredoxinreductase sequence, residue 189 in the Sarccharomyces thioredoxinreductase sequence, residue 188 in the Neurospora crassa thioredoxinreductase sequence, residue 195 in the Arabidopsis thioredoxin reductasesequence, residue 255 in the Human thioredoxin reductase sequence. Inthe wild-type protein, this residue is lysine for human and arginine forall the other listed species.

[0203] It has been observed that among the species mentioned above, theportion of the amino acid sequence corresponding to S₂ and S₃ are alsohighly conserved. S₂ comprises a polypeptide sequence selected from thegroup consisting of SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ IDNO:11, SEQ ID NO:12, SEQ ID NO:13, and SEQ ID NO:14. S₃ comprises apolypeptide sequence selected from the group consisting of SEQ ID NO:15,SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20,and SEQ ID NO:21 (FIG. 2).

[0204] Therefore, embodiments of the invention relate to a polypeptideof Formula I, where S₁ consists of a polypeptide sequence having thesequence selected from the group consisting of SEQ ID NO:1, SEQ ID NO:2,SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, and SEQ ID NO:7.

[0205] In certain embodiments, S₂ consists of a polypeptide sequenceselected from the group consisting of SEQ ID NO:8, SEQ ID NO:9, SEQ IDNO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, and SEQ ID NO:14,whereas S₃ consists of a polypeptide sequence selected from the groupconsisting of SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18,SEQ ID NO:19, SEQ ID NO:20, and SEQ ID NO:21. Other embodiments of theinvention relate to S₄ consisting of a polypeptide sequence having thesequence selected from the group consisting of SEQ ID NO:22, SEQ IDNO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, and SEQID NO:28.

[0206] In one embodiment, in the polypeptide of Formula I, S₁ is thepolypeptide sequence set forth in SEQ ID NO:1, S₂ is the polypeptidesequence set forth in SEQ ID NO:8, S₃ is the polypeptide sequence setforth in SEQ ID NO:15, and S₄ is the polypeptide sequence set forth inSEQ ID NO:22. This corresponds to a thioredoxin reductase protein, or amutant thereof, obtained from E. coli.

[0207] In another embodiment, in the polypeptide of Formula I, S₁ is thepolypeptide sequence set forth in SEQ ID NO:2, S₂ is the polypeptidesequence set forth in SEQ ID NO:9, S₃ is the polypeptide sequence setforth in SEQ ID NO:16, and S₄ is the polypeptide sequence set forth inSEQ ID NO:23. This corresponds to a thioredoxin reductase protein, or amutant thereof, obtained from Bacillus subtillis.

[0208] In yet another embodiment, in the polypeptide of Formula I, S₁ isthe polypeptide sequence set forth in SEQ ID NO:3, S₂ is the polypeptidesequence set forth in SEQ ID NO:10, S₃ is the polypeptide sequence setforth in SEQ ID NO:17, and S₄ is the polypeptide sequence set forth inSEQ ID NO:24. This corresponds to a thioredoxin reductase protein, or amutant thereof, obtained from Mycobacterium leprae.

[0209] Another embodiment of the invention relates to a polypeptide ofFormula I, in which S₁ is the polypeptide sequence set forth in SEQ IDNO:4, S₂ is the polypeptide sequence set forth in SEQ ID NO:11, S₃ isthe polypeptide sequence set forth in SEQ ID NO:18, and S₄ is thepolypeptide sequence set forth in SEQ ID NO:25. This corresponds to athioredoxin reductase protein, or a mutant thereof, obtained fromSarccharomyces.

[0210] In another embodiment, in the polypeptide of Formula I, S₁ is thepolypeptide sequence set forth in SEQ ID NO:5, S₂ is the polypeptidesequence set forth in SEQ ID NO:12, S₃ is the polypeptide sequence setforth in SEQ ID NO:19, and S₄ is the polypeptide sequence set forth inSEQ ID NO:26. This corresponds to a thioredoxin reductase protein, or amutant thereof, obtained from Neurospora crassa.

[0211] In one embodiment, in the polypeptide of Formula I, S₁ is thepolypeptide sequence set forth in SEQ ID NO:6, S₂ is the polypeptidesequence set forth in SEQ ID NO:13, S₃ is the polypeptide sequence setforth in SEQ ID NO:20, and S₄ is the polypeptide sequence set forth inSEQ ID NO:27. This corresponds to a thioredoxin reductase protein, or amutant thereof, obtained from Arabidopsis.

[0212] The invention also relates to another polypeptide of Formula I,in which S₁ is the polypeptide sequence set forth in SEQ ID NO:7, S₂ isthe polypeptide sequence set forth in SEQ ID NO:14, S₃ is thepolypeptide sequence set forth in SEQ ID NO:21, and S₄ is thepolypeptide sequence set forth in SEQ ID NO:28. This corresponds to athioredoxin reductase protein, or a mutant thereof, obtained from Human.

[0213] The invention encompasses certain mutants of the naturallyoccurring thioredoxin reductase proteins. These mutants include those inwhich A₁ is an amino acid moiety selected from the group consisting ofvaline, alanine, and leucine; A₂ is an amino acid moiety selected fromthe group consisting of glycine, valine, and leucine; A₃ is an aminoacid moiety selected from the group consisting of aspartic acid,glutamic acid, asparagine, and glutamine; A₄ is an amino acid moietyselected from the group consisting of alanine, glycine, valine, leucine,isoleucine, and methionine; A₅ is an amino acid moiety selected from thegroup consisting of asparagine, glutamine, aspartic acid, and glutamicacid; A₆ is an amino acid moiety selected from the group consisting ofglutamic acid, glutamine, aspartic acid, and asparagine.

[0214] It is understood that a polypeptide of the present invention mayhave one or more than one of the above mutations.

[0215] In certain embodiments A₁ is valine, while in others A₂ isglycine, and in others A₃ is aspartic acid; and in others A₄ is alanine,while in others A₅ is asparagine, and in others A₆ is glutamic acid. Insome embodiments, two or more of these particular amino acid residuesexist at the specified position.

[0216] In a preferred embodiment the variant proteins of the presentinvention may be fused to a second protein. For example, a fusionprotein comprising the polypeptide of Formula I and a second polypeptidemay be made. The second polypeptide may be a wild-type TR protein,wild-type thioredoxin, or a variant designed by a protein design cycle.Alternatively, a fusion protein comprising a variant protein generatedby a protein design cycle and a second polypeptide may be fused. Thesecond polypeptide may be a wild-type TR protein, wild-type thioredoxinor the polypeptide of Formula I. Such fusion may be through a linker.

[0217] By “linker”, “linker sequence”, “spacer”, tethering sequence” orgrammatical equivalents thereof, herein is meant a molecule or group ofmolecules (such as a monomer or polymer) that connects two molecules andoften serves to place the two molecules in a preferred configuration. Inone aspect of this embodiment, the linker is a peptide bond. Choosing asuitable linker for a specific case where two polypeptide chains are tobe connected depends on various parameters, e.g., the nature of the twopolypeptide chains (e.g., whether they naturally form a dimer or not),the distance between the N- and the C-termini to be connected if knownfrom three-dimensional structure determination, and/or the stability ofthe linker towards proteolysis and oxidation. Furthermore, the linkermay contain amino acid residues that provide flexibility. Thus, thelinker peptide may predominantly include the following amino acidresidues: Gly, Ser, Ala, or Thr.

[0218] The linker peptide should have a length that is adequate to linktwo monomers in such a way that they assume the correct conformationrelative to one another so that they retain the desired activity asantagonists of a given receptor. Suitable lengths for this purposeincludes at least one and not more than 30 amino acid residues.Preferably, the linker is from about 1 to 30 amino acids in length, withlinkers of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1819 and 20 amino acids in length being preferred. See also WO 01/25277,incorporated herein by reference in its entirety.

[0219] In addition, the amino acid residues selected for inclusion inthe linker peptide should exhibit properties that do not interferesignificantly with the activity of the polypeptide. Thus, the linkerpeptide on the whole should not exhibit a charge that would beinconsistent with the activity of the polypeptide, or interfere withinternal folding, or form bonds or other interactions with amino acidresidues in one or more of the monomers that would seriously impede thebinding of receptor monomer domains.

[0220] Useful linkers include glycine-serine polymers (including, forexample, (GS)_(n), (GSGGS)_(n), (GGGGS)_(n) and (GGGS)_(n), where n isan integer of at least one), glycine-alanine polymers, alanine-serinepolymers, and other flexible linkers such as the tether for the shakerpotassium channel, and a large variety of other flexible linkers, aswill be appreciated by those in the art. Glycine-serine polymers arepreferred since both of these amino acids are relatively unstructured,and therefore may be able to serve as a neutral tether betweencomponents. Secondly, serine is hydrophilic and therefore able tosolubilize what could be a globular glycine chain. Third, similar chainshave been shown to be effective in joining subunits of recombinantproteins such as single chain antibodies.

[0221] Suitable linkers may also be identified by screening databases ofknown three-dimensional structures for naturally occurring motifs thatcan bridge the gap between two polypeptide chains. Another way ofobtaining a suitable linker is by optimizing a simple linker, e.g.,(Gly₄Ser)_(n), through random mutagenesis.

[0222] In a preferred embodiment, the linker may comprise a polypeptidesequence having between about 5 and about 50 amino acids, or betweenabout 10 and about 40 amino acids, or between about 15 and about 25amino acids. In a preferred embodiment, the linker is about 22 aminoacids.

[0223] In a preferred embodiment, the variant proteins of the presentinvention may be fused to a third polypeptide, and again, such fusionmay be through a linker. The linker between the fusion polypeptide,which includes the polypeptide of Formula I, and the third polypeptidemay have a molecular weight between about 5 and about 100 kDa, or amolecular weight between about 20 and about 70 kDa, or even a molecularweight between about 25 and about 45 kDa. In a preferred embodiment, thelinker has a molecular weight of between about 30 to about 40 kDa. Incertain embodiments, this linker comprises amino acid residues that arenegatively charged, such as glutamate and aspartate.

[0224] In certain embodiments, the third polypeptide is oleosin.

[0225] Thus, one embodiment of the present invention relates to apolypeptide of Formula I, which is fused to a second polypeptide at itsC-terminus, perhaps through a linker, and is also fused to a thirdpolypeptide at its N-terminus, again perhaps through another linker.Another embodiment of the invention relates to a series of fusedpolypeptides of Formula II

oleosin-linker 1-thioredoxin reductase-linker 2-thioredoxin  (II)

[0226] where “linker 1” refers to the linker between the polypeptide ofFormula I and the third polypeptide, set forth above, and “linker 2”refers to the linker between the polypeptide of Formula I and the secondpolypeptide, set forth above. Likewise, some embodiments of theinvention can include any other fusion protein comprising thepolypeptide of Formula I, whether it is fused to another protein at itsN-terminus, its C-terminus, or both. Specifically, the inventioncontemplates modifications of Formula II or any other fusion of twopolypeptides to the polypeptide of Formula I in which the componentsoccur in any order.

[0227] In a preferred embodiment, the binding affinities of variant TRproteins for NADPH and NADH are determined. Suitable assays include, butare not limited to, e.g., quantitative comparisons comparing kinetic andequilibrium binding constants. The kinetic association rate (K_(on)) anddissociation rate (K_(off)), and the equilibrium binding constants(K_(d)) can be determined using surface plasmon resonance on a BIAcoreinstrument following the standard procedure in the literature [Pearce etal., Biochemistry 38:81-89 (1999)].

[0228] In a preferred embodiment, the antigenic profile in the hostanimal of the variant TR protein is similar, and preferably identical,to the antigenic profile of the host TR that is, the variant TR proteindoes not significantly stimulate the host organism (e.g. the patient) toan immune response; that is, any immune response is not clinicallyrelevant and there is no allergic response or neutralization of theprotein by an antibody. That is, in a preferred embodiment, the variantTR protein does not contain additional or different epitopes from theTR. By ‘epitope” or “determinant” herein is meant a portion of a proteinwhich will generate and/or bind an antibody. Thus, in most instances, nosignificant amounts of antibodies are generated to a variant TR protein.In general, this is accomplished by not significantly altering surfaceresidues, or by adding any amino acid residues on the surface which canbecome glycosylated, as novel glycosylation can result in an immuneresponse.

[0229] The variant TR proteins and nucleic acids of the invention aredistinguishable from naturally occurring wild-type TR. By “naturallyoccurring” or “wild type” or grammatical equivalents, herein is meant anamino acid sequence or a nucleotide sequence that is found in nature andincludes allelic variations; that is, an amino acid sequence or anucleotide sequence that usually has not been intentionally modified.Accordingly, by “non-naturally occurring” or “synthetic” or“recombinant” or grammatical equivalents thereof, herein is meant anamino acid sequence or a nucleotide sequence that is not found innature; that is, an amino acid sequence or a nucleotide sequence thatusually has been intentionally modified. It is understood that once arecombinant nucleic acid is made and reintroduced into a host cell ororganism, it will replicate non-recombinantly, i.e., using the in vivocellular machinery of the host cell rather than in vitro manipulations,however, such nucleic acids, once produced recombinantly, althoughsubsequently replicated non-recombinantly, are still consideredrecombinant for the purpose of the invention. Representative amino acidsequences of naturally occurring TR proteins are shown in FIG. 21. Itshould be noted that unless otherwise stated, all positional numberingof variant TR proteins and variant TR proteins is based on thesesequences. That is, as will be appreciated by those in the art, analignment of TR proteins and variant TR proteins can be done usingstandard programs, as is outlined below, with the identification of“equivalent” positions between the two proteins.

[0230] Thus, in a preferred embodiment, the variant TR protein has anamino acid sequence that differs from a wild-type TR sequence (FIG. 21)by at least 1-5% of the residues. That is, the variant TR proteins ofthe invention are less than about 97-99% identical to a wild-type TRamino acid sequence. Accordingly, a protein is a “variant TR protein” ifthe overall homology of the protein sequence to the amino acid sequenceis preferably less than about 99%, more preferably less than about 98%,even more preferably less than about 97% and more preferably less than95% of a wild-type TR protein. In some embodiments, the homology will beas low as about 75-80%. Stated differently, variant TR proteins have atleast about 1 residue that differs from the wild-type TR sequence (i.e.,FIG. 21), with at least about 2, 3, 4, 5, up to 50 different residues.Preferably variant TR proteins have 1 to 3 different residues. Morepreferably, variant TR proteins have 3 to 5 different residues.Preferably variant TR proteins have 5 to 10 different residues.Preferably variant TR proteins have 10 to 15 different residues.Preferably variant TR proteins have 15 to 25 different residues.Preferably variant TR proteins have 25 to 35 different residues.

[0231] Homology in this context means sequence similarity or identity,with identity being preferred. As is known in the art, a number ofdifferent programs can be used to identify whether a protein (or nucleicacid as discussed below) has sequence identity or similarity to a knownsequence. Sequence identity and/or similarity is determined usingstandard techniques known in the art, including, but not limited to, thelocal sequence identity algorithm of Smith & Waterman, Adv. Appl. Math.,2:482 (1981), by the sequence identity alignment algorithm of Needleman& Wunsch, J. Mol. Biol., 48:443 (1970), by the search for similaritymethod of Pearson & Lipman, Proc. Natl. Acad. Sci. U.S.A., 85:2444(1988), by computerized implementations of these algorithms (GAP,BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package,Genetics Computer Group, 575 Science Drive, Madison, Wis.), the Best Fitsequence program described by Devereux et al., Nucl. Acid Res.,12:387-395 (1984), preferably using the default settings, or byinspection. Preferably, percent identity is calculated by FastDB basedupon the following parameters: mismatch penalty of 1; gap penalty of 1;gap size penalty of 0.33; and joining penalty of 30, “Current Methods inSequence Comparison and Analysis,” Macromolecule Sequencing andSynthesis, Selected Methods and Applications, pp 127-149 (1988), Alan R.Liss, Inc.

[0232] An example of a useful algorithm is PILEUP. PILEUP creates amultiple sequence alignment from a group of related sequences usingprogressive, pair wise alignments. It can also plot a tree showing theclustering relationships used to create the alignment. PILEUP uses asimplification of the progressive alignment method of Feng & Doolittle,J. Mol. Evol. 35:351-360 (1987); the method is similar to that describedby Higgins & Sharp CABIOS 5:151-153 (1989). Useful PILEUP parametersincluding a default gap weight of 3.00, a default gap length weight of0.10, and weighted end gaps.

[0233] Another example of a useful algorithm is the BLAST algorithm,described in: Altschul et al., J. Mol. Biol. 215, 403-410, (1990);Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997); and Karlin etal., Proc. Natl. Acad. Sci. U.S.A. 90:5873-5787 (1993). A particularlyuseful BLAST program is the WU-BLAST-2 program which was obtained fromAltschul et al., Methods in Enzymology, 266:460-480 (1996);http://blast.wustl/edu/blast/README.html]. WU-BLAST-2 uses severalsearch parameters, most of which are set to the default values. Theadjustable parameters are set with the following values: overlap span=1,overlap fraction=0.125, word threshold (T)=11. The HSP S and HSP S2parameters are dynamic values and are established by the program itselfdepending upon the composition of the particular sequence andcomposition of the particular database against which the sequence ofinterest is being searched; however, the values may be adjusted toincrease sensitivity. An additional useful algorithm is gapped BLAST asreported by Altschul et al., Nucl. Acids Res., 25:3389-3402. GappedBLAST uses BLOSUM-62 substitution scores; threshold T parameter set to9; the two-hit method to trigger ungapped extensions; charges gaplengths of k a cost of 10+k; X_(u) set to 16, and X_(g) set to 40 fordatabase search stage and to 67 for the output stage of the algorithms.Gapped alignments are triggered by a score corresponding to ˜22 bits.

[0234] A % amino acid sequence identity value is determined by thenumber of matching identical residues divided by the total number ofresidues of the “longer” sequence in the aligned region. The “longer”sequence is the one having the most actual residues in the alignedregion (gaps introduced by WU-Blast-2 to maximize the alignment scoreare ignored).

[0235] In a similar manner, “percent (%) nucleic acid sequence identity”with respect to the coding sequence of the polypeptides identifiedherein is defined as the percentage of nucleotide residues in acandidate sequence that are identical with the nucleotide residues inthe coding sequence of the cell cycle protein. A preferred methodutilizes the BLASTN module of WU-BLAST-2 set to the default parameters,with overlap span and overlap fraction set to 1 and 0.125, respectively.

[0236] The alignment may include the introduction of gaps in thesequences to be aligned. In addition, for sequences which contain eithermore or fewer amino acids than a wild-type TR sequence (i.e., see FIG.2, FIG. 16N), it is understood that in one embodiment, the percentage ofsequence identity will be determined based on the number of identicalamino acids in relation to the total number of amino acids. Thus, forexample, sequence identity of sequences shorter than a wild-type TRprotein sequence (i.e., see FIG. 2, FIG. 16N), as discussed below, willbe determined using the number of amino acids in the shorter sequence,in one embodiment. In percent identity calculations relative weight isnot assigned to various manifestations of sequence variation, such as,insertions, deletions, substitutions, etc.

[0237] In one embodiment, only identities are scored positively (+1) andall forms of sequence variation including gaps are assigned a value of“0”, which obviates the need for a weighted scale or parameters asdescribed below for sequence similarity calculations. Percent sequenceidentity can be calculated, for example, by dividing the number ofmatching identical residues by the total number of residues of the“shorter” sequence in the aligned region and multiplying by 100. The“longer” sequence is the one having the most actual residues in thealigned region.

[0238] Thus, the variant TR proteins of the present invention may beshorter or longer than the amino acid sequence of wild-type TR proteins(i.e., FIG. 21. Thus, in a preferred embodiment, included within thedefinition of variant TR proteins are portions or fragments of thesequences depicted herein. Fragments of variant TR proteins areconsidered variant TR proteins if a) they share at least one antigenicepitope; b) have at least the indicated homology; c) and preferably havevariant TR biological activity as defined herein.

[0239] In a preferred embodiment, as is more fully outlined below, thevariant TR proteins include further amino acid variations, as comparedto a wild type TR, than those outlined herein. In addition, as outlinedherein, any of the variations depicted herein may be combined in any wayto form additional novel variant TR proteins.

[0240] In addition, variant TR proteins can be made that are longer thanthose depicted in the figures, for example, by the addition of epitopeor purification tags, as outlined herein, the addition of other fusionsequences, etc. For example, the variant TR proteins of the inventionmay be fused to other therapeutic proteins or to other proteins such asFc or serum albumin for pharmacokinetic purposes. See for example U.S.Pat. No. 5,766,883 and 5,876,969, both of which are expresslyincorporated by reference.

[0241] In a preferred embodiment, the variant TR proteins of theinvention are human TR conformers. By “conformer” herein is meant aprotein that has a protein backbone 3D structure that is virtually thesame but has significant differences in the amino acid side chains. Thatis, the variant TR proteins of the invention define a conformer set,wherein all of the proteins of the set share a backbone structure andyet have sequences that differ by at least 1-3-5%. The three dimensionalbackbone structure of a variant TR protein thus substantiallycorresponds to the three dimensional backbone structure of human TR.“Backbone” in this context means the non-side chain atoms: the nitrogen,carbonyl carbon and oxygen, and the α-carbon, and the hydrogens attachedto the nitrogen and α-carbon. To be considered a conformer, a proteinmust have backbone atoms that are no more than 2 angstroms from thehuman TR structure, with no more than 1.5 angstroms being preferred, andno more than 1 angstrom being particularly preferred. In general, thesedistances may be determined in two ways. In one embodiment, eachpotential conformer is crystallized and its three dimensional structuredetermined. Alternatively, as the former is quite tedious, the sequenceof each potential conformer is run in the PDA program to determinewhether it is a conformer.

[0242] In alternative embodiments, the variant TR proteins of theinvention may be conformers of any of the TR proteins listed in FIG. 21.

[0243] Variant TR proteins may also be identified as being encoded byvariant TR nucleic acids. In the case of the nucleic acid, the overallhomology of the nucleic acid sequence is commensurate with amino acidhomology but takes into account the degeneracy in the genetic code andcodon bias of different organisms. Accordingly, the nucleic acidsequence homology may be either lower or higher than that of the proteinsequence, with lower homology being preferred.

[0244] In a preferred embodiment, a variant TR nucleic acid encodes avariant TR protein. As will be appreciated by those in the art, due tothe degeneracy of the genetic code, an extremely large number of nucleicacids may be made, all of which encode the variant TR proteins of thepresent invention. Thus, having identified a particular amino acidsequence, those skilled in the art could make any number of differentnucleic acids, by simply modifying the sequence of one or more codons ina way which does not change the amino acid sequence of the variant TR.

[0245] In one embodiment, the nucleic acid homology is determinedthrough hybridization studies. Thus, for example, nucleic acids whichhybridize under high stringency to the nucleic acid sequence shown inFIG. 21 or its complement and encode a variant TR protein is considereda variant TR gene.

[0246] High stringency conditions are known in the art; see for exampleManiatis et al., Molecular Cloning: A Laboratory Manual, 2d Edition,1989, and Short Protocols in Molecular Biology, ed. Ausubel, et al.,both of which are hereby incorporated by reference. Stringent conditionsare sequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures. Anextensive guide to the hybridization of nucleic acids is found inTijssen, Techniques in Biochemistry and Molecular Biology—Hybridizationwith Nucleic Acid Probes, “Overview of principles of hybridization andthe strategy of nucleic acid assays” (1993). Generally, stringentconditions are selected to be about 5-10° C. lower than the thermalmelting point (T_(m)) for the specific sequence at a defined ionicstrength and pH. The T_(m) is the temperature (under defined ionicstrength, pH and nucleic acid concentration) at which 50% of the probescomplementary to the target hybridize to the target sequence atequilibrium (as the target sequences are present in excess, at T_(m),50% of the probes are occupied at equilibrium). Stringent conditionswill be those in which the salt concentration is less than about 1.0 Msodium ion, typically about 0.01 to 1.0 M sodium ion concentration (orother salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C. for short probes (e.g. 10 to 50 nucleotides) and at least about 60°C. for long probes (e.g. greater than 50 nucleotides). Stringentconditions may also be achieved with the addition of destabilizingagents such as formamide.

[0247] In another embodiment, less stringent hybridization conditionsare used; for example, moderate or low stringency conditions may beused, as are known in the art; see Maniatis and Ausubel, supra, andTijssen, supra.

[0248] The variant TR proteins and nucleic acids of the presentinvention are recombinant. As used herein, “nucleic acid” may refer toeither DNA or RNA, or molecules which contain both deoxy- andribonucleotides. The nucleic acids include genomic DNA, cDNA andoligonucleotides including sense and anti-sense nucleic acids. Suchnucleic acids may also contain modifications in the ribose-phosphatebackbone to increase stability and half life of such molecules inphysiological environments.

[0249] The nucleic acid may be double stranded, single stranded, orcontain portions of both double stranded or single stranded sequence. Aswill be appreciated by those in the art, the depiction of a singlestrand (“Watson”) also defines the sequence of the other strand(“Crick”); thus the sequence depicted in FIG. 6 also includes thecomplement of the sequence. By the term “recombinant nucleic acid”herein is meant nucleic acid, originally formed in vitro, in general, bythe manipulation of nucleic acid by endonucleases, in a form notnormally found in nature. Thus an isolated variant TR nucleic acid, in alinear form, or an expression vector formed in vitro by ligating DNAmolecules that are not normally joined, are both considered recombinantfor the purposes of this invention. It is understood that once arecombinant nucleic acid is made and reintroduced into a host cell ororganism, it will replicate non-recombinantly, i.e. using the in vivocellular machinery of the host cell rather than in vitro manipulations;however, such nucleic acids, once produced recombinantly, althoughsubsequently replicated non-recombinantly, are still consideredrecombinant for the purposes of the invention.

[0250] Similarly, a “recombinant protein” is a protein made usingrecombinant techniques, i.e. through the expression of a recombinantnucleic acid as depicted above. A recombinant protein is distinguishedfrom naturally occurring protein by at least one or morecharacteristics. For example, the protein may be isolated or purifiedaway from some or all of the proteins and compounds with which it isnormally associated in its wild type host, and thus may be substantiallypure. For example, an isolated protein is unaccompanied by at least someof the material with which it is normally associated in its naturalstate, preferably constituting at least about 0.5%, more preferably atleast about 5% by weight of the total protein in a given sample. Asubstantially pure protein comprises at least about 75% by weight of thetotal protein, with at least about 80% being preferred, and at leastabout 90% being particularly preferred. The definition includes theproduction of a variant TR protein from one organism in a differentorganism or host cell. Alternatively, the protein may be made at asignificantly higher concentration than is normally seen, through theuse of a inducible promoter or high expression promoter, such that theprotein is made at increased concentration levels. Furthermore, all ofthe variant TR proteins outlined herein are in a form not normally foundin nature, as they contain amino acid substitutions, insertions anddeletions, with substitutions being preferred, as discussed below.

[0251] Also included within the definition of variant TR proteins of thepresent invention are amino acid sequence variants of the variant TRsequences outlined herein and shown in the Figures. That is, the variantTR proteins may contain additional variable positions as compared tohuman TR. These variants fall into one or more of three classes:substitutional, insertional or deletional variants. These variantsordinarily are prepared by site specific mutagenesis of nucleotides inthe DNA encoding a variant TR protein, using cassette or PCR mutagenesisor other techniques well known in the art, to produce DNA encoding thevariant, and thereafter expressing the DNA in recombinant cell cultureas outlined above. However, variant TR protein fragments having up toabout 100-150 residues may be prepared by in vitro synthesis usingestablished techniques. Amino acid sequence variants are characterizedby the predetermined nature of the variation, a feature that sets themapart from naturally occurring allelic or interspecies variation of thevariant TR protein amino acid sequence. The variants typically exhibitthe same qualitative biological activity as the naturally occurringanalogue; although variants can also be selected which have modifiedcharacteristics as will be more fully outlined below.

[0252] While the site or region for introducing an amino acid sequencevariation is predetermined, the mutation per se need not bepredetermined. For example, in order to optimize the performance of amutation at a given site, random mutagenesis may be conducted at thetarget codon or region and the expressed variant TR proteins screenedfor the optimal combination of desired activity. Techniques for makingsubstitution mutations at predetermined sites in DNA having a knownsequence are well known, for example, M13 primer mutagenesis and PCRmutagenesis. Screening of the mutants is done using assays of variant TRprotein activities.

[0253] Amino acid substitutions are typically of single residues;insertions usually will be on the order of from about 1 to 20 aminoacids, although considerably larger insertions may be tolerated.Deletions range from about 1 to about 20 residues, although in somecases deletions may be much larger.

[0254] Substitutions, deletions, insertions or any combination thereofmay be used to arrive at a final derivative. Generally these changes aredone on a few amino acids to minimize the alteration of the molecule.However, larger changes may be tolerated in certain circumstances. Whensmall alterations in the characteristics of the variant TR protein aredesired, substitutions are generally made in accordance with thefollowing chart: CHART I Original Residue Exemplary Substitutions AlaSer Arg Lys Asn Gln, His Asp Glu Cys Ser, Ala Gln Asn Glu Asp Gly ProHis Asn, Gln Ile Leu, Val Leu Ile, Val Lys Arg, Gln, Glu Met Leu, IlePhe Met, Leu, Tyr Ser Thr Thr Ser Trp Tyr Tyr Trp, Phe Val Ile, Leu

[0255] Substantial changes in function or immunological identity aremade by selecting substitutions that are less conservative than thoseshown in Chart 1. For example, substitutions may be made which moresignificantly affect: the structure of the polypeptide backbone in thearea of the alteration, for example the alpha-helical or beta-sheetstructure; the charge or hydrophobicity of the molecule at the targetsite; or the bulk of the side chain. The substitutions which in generalare expected to produce the greatest changes in the polypeptide'sproperties are those in which (a) a hydrophilic residue, e.g. seryl orthreonyl, is substituted for (or by) a hydrophobic residue, e.g. leucyl,isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine or proline issubstituted for (or by) any other residue; (c) a residue having anelectropositive side chain, e.g. lysyl, arginyl, or histidyl, issubstituted for (or by) an electronegative residue, e.g. glutamyl oraspartyl; or (d) a residue having a bulky side chain, e.g.phenylalanine, is substituted for (or by) one not having a side chain,e.g. glycine.

[0256] The variants typically exhibit the same qualitative biologicalactivity and will elicit the same immune response as the originalvariant TR protein, although variants also are selected to modify thecharacteristics of the variant TR proteins as needed. Alternatively, thevariant may be designed such that the biological activity of the variantTR protein is altered. For example, glycosylation sites may be alteredor removed. Similarly, the biological function may be altered; forexample, in some instances it may be desirable to have more or lesspotent TR activity.

[0257] The variant TR proteins and nucleic acids of the invention can bemade in a number of ways. Individual nucleic acids and proteins can bemade as known in the art and outlined below. Alternatively, libraries ofvariant TR proteins can be made for testing.

[0258] In a preferred embodiment, sets or libraries of variant TRproteins are generated from a probability distribution table. Asoutlined herein, there are a variety of methods of generating aprobability distribution table, including using PDA, sequencealignments, forcefield calculations such as SCMF calculations, etc. Inaddition, the probability distribution can be used to generateinformation entropy scores for each position, as a measure of themutational frequency observed in the library.

[0259] In this embodiment, the frequency of each amino acid residue ateach variable position in the list is identified. Frequencies can bethresholded, wherein any variant frequency lower than a cutoff is set tozero. This cutoff is preferably 1%, 2%, 5%, 10% or 20%, with 10% beingparticularly preferred. These frequencies are then built into thevariant TR library. That is, as above, these variable positions arecollected and all possible combinations are generated, but the aminoacid residues that “fill” the library are utilized on a frequency basis.Thus, in a non-frequency based library, a variable position that has 5possible residues will have 20% of the proteins comprising that variableposition with the first possible residue, 20% with the second, etc.However, in a frequency based library, a variable position that has 5possible residues with frequencies of 10%, 15%, 25%, 30% and 20%,respectively, will have 10% of the proteins comprising that variableposition with the first possible residue, 15% of the proteins with thesecond residue, 25% with the third, etc. As will be appreciated by thosein the art, the actual frequency may depend on the method used toactually generate the proteins; for example, exact frequencies may bepossible when the proteins are synthesized. However, when thefrequency-based primer system outlined below is used, the actualfrequencies at each position will vary, as outlined below.

[0260] As will be appreciated by those in the art and outlined herein,probability distribution tables can be generated in a variety of ways.In addition to the methods outlined herein, self-consistent mean field(SCMF) methods can be used in the direct generation of probabilitytables. SCMF is a deterministic computational method that uses a meanfield description of rotamer interactions to calculate energies. Aprobability table generated in this way can be used to create librariesas described herein. SCMF can be used in three ways: the frequencies ofamino acids and rotamers for each amino acid are listed at eachposition; the probabilities are determined directly from SCMF (seeDelarue et la. Pac. Symp. Biocomput. 109-21 (1997), expresslyincorporated by reference). In addition, highly variable positions andnon-variable positions can be identified. Alternatively, another methodis used to determine what sequence is jumped to during a search ofsequence space; SCMF is used to obtain an accurate energy for thatsequence; this energy is then used to rank it and create a rank-orderedlist of sequences (similar to a Monte Carlo sequence list). Aprobability table showing the frequencies of amino acids at eachposition can then be calculated from this list (Koehl et al., J. Mol.Biol. 239:249 (1994); Koehl et al., Nat. Struc. Biol. 2:163 (1995);Koehl et al., Curr. Opin. Struct. Biol. 6:222 (1996); Koehl et al., J.Mol. Bio. 293:1183 (1999); Koehl et al., J. Mol. Biol. 293:1161 (1999);Lee J. Mol. Biol. 236:918 (1994); and Vasquez Biopolymers 36:53-70(1995); all of which are expressly incorporated by reference. Similarmethods include, but are not limited to, OPLS-AA (Jorgensen, et al., J.Am. Chem. Soc. (1996), v 118, pp 11225_(—)11236; Jorgensen, W. L.; BOSS,Version 4.1; Yale University: New Haven, Conn. (1999)); OPLS (Jorgensen,et al., J. Am. Chem. Soc. (1988), v 110, pp 1657ff; Jorgensen, et al., JAm. Chem. Soc. (1990), v 112, pp 4768ff); UNRES (United ResidueForcefield; Liwo, et al., Protein Science (1993), v 2, pp1697-1714;Liwo, et al., Protein Science (1993), v 2, pp1715-1731; Liwo, et al., J.Comp. Chem. (1997), v 18, pp849_(—)873; Liwo, et al., J. Comp. Chem.(1997), v 18, pp874-884; Liwo, et al., J. Comp. Chem. (1998), v 19,pp259-276; Forcefield for Protein Structure Prediction (Liwo, et al.,Proc. Natl. Acad. Sci. USA (1999), v 96, pp5482-5485); ECEPP/3 (Liwo etal., J Protein Chem 1994 May;13(4):375-80); AMBER 1.1 force field(Weiner, et al., J. Am. Chem. Soc. v106, pp765-784); AMBER 3.0 forcefield (U. C. Singh et al., Proc. Natl. Acad. Sci. USA. 82:755-759);CHARMM and CHARMM22 (Brooks, et al., J. Comp. Chem. v4, pp 187-217);cvff3.0 (Dauber-Osguthorpe, et al.,(1988) Proteins: Structure, Functionand Genetics, v4,pp31-47); cff91 (Maple, et al., J. Comp. Chem. v15,162-182); also, the DISCOVER (cvff and cff91) and AMBER forcefields areused in the INSIGHT molecular modeling package (Biosym/MSI, San DiegoCalif.) and HARMM is used in the QUANTA molecular modeling package(Biosym/MSI, San Diego Calif.).

[0261] In addition, as outlined herein, a preferred method of generatinga probability distribution table is through the use of sequencealignment programs. In addition, the probability table can be obtainedby a combination of sequence alignments and computational approaches.For example, one can add amino acids found in the alignment ofhomologous sequences to the result of the computation. Preferable onecan add the wild type amino acid identity to the probability table if itis not found in the computation.

[0262] As will be appreciated, a variant TR library created byrecombining variable positions and/or residues at the variable positionmay not be in a rank-ordered list. In some embodiments, the entire listmay just be made and tested. Alternatively, in a preferred embodiment,the variant TR library is also in the form of a rank ordered list. Thismay be done for several reasons, including the size of the library isstill too big to generate experimentally, or for predictive purposes.This may be done in several ways.

[0263] In one embodiment, the library is ranked using the scoringfunctions of PDA to rank the library members. Alternatively, statisticalmethods could be used. For example, the library may be ranked byfrequency score; that is, proteins containing the most of high frequencyresidues could be ranked higher, etc. This may be done by adding ormultiplying the frequency at each variable position to generate anumerical score. Similarly, the library different positions could beweighted and then the proteins scored; for example, those containingcertain residues could be arbitrarily ranked.

[0264] In a preferred embodiment, the different protein members of thevariant TR library may be chemically synthesized. This is particularlyuseful when the designed proteins are short, preferably less than 150amino acids in length, with less than 100 amino acids being preferred,and less than 50 amino acids being particularly preferred, although asis known in the art, longer proteins can be made chemically orenzymatically. See for example Wilken et al, Curr. Opin. Biotechnol.9:412-26 (1998), hereby expressly incorporated by reference.

[0265] In a preferred embodiment, particularly for longer proteins orproteins for which large samples are desired, the library sequences areused to create nucleic acids such as DNA which encode the membersequences and which can then be cloned into host cells, expressed andassayed, if desired. Thus, nucleic acids, and particularly DNA, can bemade which encodes each member protein sequence. This is done using wellknown procedures. The choice of codons, suitable expression vectors andsuitable host cells will vary depending on a number of factors, and canbe easily optimized as needed.

[0266] In a preferred embodiment, multiple PCR reactions with pooledoligonucleotides is done, as is generally described in U.S. Ser. No.09/927,790; incorporated herein by reference. In this embodiment,overlapping oligonucleotides are synthesized which correspond to thefull length gene. Again, these oligonucleotides may represent all of thedifferent amino acids at each variant position or subsets.

[0267] In a preferred embodiment, these oligonucleotides are pooled inequal proportions and multiple PCR reactions are performed to createfull length sequences containing the combinations of mutations definedby the library. In addition, this may be done using error-prone PCRmethods.

[0268] In a preferred embodiment, the different oligonucleotides areadded in relative amounts corresponding to the probability distributiontable. The multiple PCR reactions thus result in full length sequenceswith the desired combinations of mutations in the desired proportions.

[0269] The total number of oligonucleotides needed is a function of thenumber of positions being mutated and the number of mutations beingconsidered at these positions:

(number of oligos for constant positions)+M1+M2+M3+ . . . Mn=(totalnumber of oligos required),

[0270] where Mn is the number of mutations considered at position n inthe sequence.

[0271] In a preferred embodiment, each overlapping oligonucleotidecomprises only one position to be varied; in alternate embodiments, thevariant positions are too close together to allow this and multiplevariants per oligonucleotide are used to allow complete recombination ofall the possibilities. That is, each oligo can contain the codon for asingle position being mutated, or for more than one position beingmutated. The multiple positions being mutated must be close in sequenceto prevent the oligo length from being impractical. For multiplemutating positions on an oligonucleotide, particular combinations ofmutations can be included or excluded in the library by including orexcluding the oligonucleotide encoding that combination. For example, asdiscussed herein, there may be correlations between variable regions;that is, when position X is a certain residue, position Y must (or mustnot) be a particular residue. These sets of variable positions aresometimes referred to herein as a “cluster”. When the clusters arecomprised of residues close together, and thus can reside on oneoligonucleotide primer, the clusters can be set to the “good”correlations, and eliminate the bad combinations that may decrease theeffectiveness of the library. However, if the residues of the clusterare far apart in sequence, and thus will reside on differentoligonucleotides for synthesis, it may be desirable to either set theresidues to the “good” correlation, or eliminate them as variableresidues entirely. In an alternative embodiment, the library may begenerated in several steps, so that the cluster mutations only appeartogether. This procedure, i.e. the procedure of identifying mutationclusters and either placing them on the same oligonucleotides oreliminating them from the library or library generation in several stepspreserving clusters, can considerably enrich the experimental librarywith properly folded protein. Identification of clusters can be carriedout by a number of ways, e.g. by using known pattern recognitionmethods, comparisons of frequencies of occurrence of mutations or byusing energy analysis of the sequences to be experimentally generated(for example, if the energy of interaction is high, the positions arecorrelated). These correlations may be positional correlations (e.g.variable positions 1 and 2 always change together or never changetogether) or sequence correlations (e.g. if there is residue A atposition 1, there is always residue B at position 2). See: Patterndiscovery in Biomolecular Data: Tools, Techniques, and Applications;edited by Jason T. L. Wang, Bruce A. Shapiro, Dennis Shasha. New York:Oxford University, 1999; Andrews, Harry C. Introduction to mathematicaltechniques in pattern recognition; New York, Wiley-Interscience [1972];Applications of Pattern Recognition; Editor, K. S. Fu. Boca Raton, Fla.CRC Press, 1982; Genetic Algorithms for Pattern Recognition; edited bySankar K. Pal, Paul P. Wang. Boca Raton: CRC Press, c1996; Pandya,Abhijit S., Pattern recognition with neural networks in C++/Abhijit S.Pandya, Robert B. Macy. Boca Raton, Fla.: CRC Press, 1996; Handbook ofpattern recognition & computer vision/edited by C. H. Chen, L. F. Pau,P. S. P. Wang. 2nd ed. Singapore; River Edge, N.J.: World Scientific,c1999; Friedman, Introduction to Pattern Recognition: Statistical,Structural, Neural, and Fuzy Logic Approaches; River Edge, N.J.: WorldScientific, c1999, Series title: Series in machine perception andartificial intelligence; vol. 32; all of which are expresslyincorporated by reference. In addition, programs used to search forconsensus motifs can be used as well.

[0272] In addition, correlations and shuffling can be fixed or optimizedby altering the design of the oligonucleotides; that is, by decidingwhere the oligonucleotides (primers) start and stop (e.g. where thesequences are “cut”). The start and stop sites of oligos can be set tomaximize the number of clusters that appear in single oligonucleotides,thereby enriching the library with higher scoring sequences. Differentoligonucleotide start and stop site options can be computationallymodeled and ranked according to number of clusters that are representedon single oligos, or the percentage of the resulting sequencesconsistent with the predicted library of sequences.

[0273] The total number of oligonucleotides required increases whenmultiple mutable positions are encoded by a single oligonucleotide. Theannealed regions are the ones that remain constant, i.e. have thesequence of the reference sequence.

[0274] Oligonucleotides with insertions or deletions of codons can beused to create a library expressing different length proteins. Inparticular computational sequence screening for insertions or deletionscan result in secondary libraries defining different length proteins,which can be expressed by a library of pooled oligonucleotide ofdifferent lengths.

[0275] In a preferred embodiment, the variant TR library is done byshuffling the family (e.g. a set of variants); that is, some set of thetop sequences (if a rank-ordered list is used) can be shuffled, eitherwith or without error_prone PCR. “Shuffling” in this context means arecombination of related sequences, generally in a random way. It caninclude “shuffling” as defined and exemplified in U.S. Pat. Nos.5,830,721; 5,811,238; 5,605,793; 5,837,458 and PCT US/19256, all ofwhich are expressly incorporated by reference in their entirety. Thisset of sequences can also be an artificial set; for example, from aprobability table (for example generated using SCMF) or a Monte Carloset. Similarly, the “family” can be the top 10 and the bottom 10sequences, the top 100 sequence, etc. This may also be done usingerror-prone PCR.

[0276] Thus, in a preferred embodiment, in silico shuffling is doneusing the computational methods described herein. That is, starting witheither two libraries or two sequences, random recombinations of thesequences can be generated and evaluated.

[0277] In a preferred embodiment, error-prone PCR is done to generatethe variant TR library. See U.S. Pat. Nos. 5,605,793, 5,811,238, and5,830,721, all of which are hereby incorporated by reference. This canbe done on the optimal sequence or on top members of the library, orsome other artificial set or family. In this embodiment, the gene forthe optimal sequence found in the computational screen of the primarylibrary can be synthesized. Error prone PCR is then performed on theoptimal sequence gene in the presence of oligonucleotides that code forthe mutations at the variant positions of the library (biasoligonucleotides). The addition of the oligonucleotides will create abias favoring the incorporation of the mutations in the library.Alternatively, only oligonucleotides for certain mutations may be usedto bias the library.

[0278] In a preferred embodiment, gene shuffling with error prone PCRcan be performed on the gene for the optimal sequence, in the presenceof bias oligonucleotides, to create a DNA sequence library that reflectsthe proportion of the mutations found in the variant TR library. Thechoice of the bias oligonucleotides can be done in a variety of ways;they can chosen on the basis of their frequency, i.e. oligonucleotidesencoding high mutational frequency positions can be used; alternatively,oligonucleotides containing the most variable positions can be used,such that the diversity is increased; if the secondary library isranked, some number of top scoring positions can be used to generatebias oligonucleotides; random positions may be chosen; a few top scoringand a few low scoring ones may be chosen; etc. What is important is togenerate new sequences based on preferred variable positions andsequences.

[0279] In a preferred embodiment, PCR using a wild type gene or othergene can be used, as is generally described in U.S. Ser. No. 09/927,790;incorporated herein by reference. In this embodiment, a starting gene isused; generally, although this is not required, the gene is usually thewild type gene. In some cases it may be the gene encoding the globaloptimized sequence, or any other sequence of the list, or a consensussequence obtained e.g. from aligning homologous sequences from differentorganisms. In this embodiment, oligonucleotides are used that correspondto the variant positions and contain the different amino acids of thelibrary. PCR is done using PCR primers at the termini, as is known inthe art. This provides two benefits; the first is that this generallyrequires fewer oligonucleotides and can result in fewer errors. Inaddition, it has experimental advantages in that if the wild type geneis used, it need not be synthesized.

[0280] In addition, there are several other techniques that can be used,as exemplified in the figures. In a preferred embodiment, ligation ofPCR products is done.

[0281] In a preferred embodiment, a variety of additional steps may bedone to the variant TR library; for example, further computationalprocessing can occur, different variant TR libraries can be recombined,or cutoffs from different libraries can be combined. In a preferredembodiment, a variant TR library may be computationally remanipulated toform an additional variant TR library (sometimes referred to herein as“tertiary libraries”). For example, any of the variant TR librarysequences may be chosen for a second round of PDA, by freezing or fixingsome or all of the changed positions in the first library.Alternatively, only changes seen in the last probability distributiontable are allowed. Alternatively, the stringency of the probabilitytable may be altered, either by increasing or decreasing the cutoff forinclusion. Similarly, the variant TR library may be recombinedexperimentally after the first round; for example, the best gene/genesfrom the first screen may be taken and gene assembly redone (usingtechniques outlined below, multiple PCR, error prone PCR, shuffling,etc.).

[0282] Alternatively, the fragments from one or more good gene(s) tochange probabilities at some positions. This biases the search to anarea of sequence space found in the first round of computational andexperimental screening.

[0283] In a preferred embodiment, a tertiary library can be generatedfrom combining different variant TR-libraries. For example, aprobability distribution table from a first variant TR library can begenerated and recombined, either computationally or experimentally, asoutlined herein. A PDA variant TR library may be combined with asequence alignment variant TR library, and either recombined (again,computationally or experimentally) or just the cutoffs from each joinedto make a new tertiary library. The top sequences from several librariescan be recombined. Sequences from the top of a library can be combinedwith sequences from the bottom of the library to more broadly samplesequence space, or only sequences distant from the top of the librarycan be combined. Variant TR libraries that analyzed different parts of aprotein can be combined to a tertiary library that treats the combinedparts of the protein.

[0284] In a preferred embodiment, a tertiary library can be generatedusing correlations in a variant TR library. That is, a residue at afirst variable position may be correlated to a residue at secondvariable position (or correlated to residues at additional positions aswell). For example, two variable positions may sterically orelectrostatically interact, such that if the first residue is X, thesecond residue must be Y. This may be either a positive or negativecorrelation.

[0285] Using the nucleic acids of the present invention that encodecandidate variant proteins or candidate variant library members, avariety of expression vectors are made. The expression vectors may beeither self-replicating extrachromosomal vectors or vectors whichintegrate into a host genome. Generally, these expression vectorsinclude transcriptional and translational regulatory nucleic acidoperably linked to the nucleic acid encoding the library protein. Theterm “control sequences” refers to DNA sequences necessary for theexpression of an operably linked coding sequence in a particular hostorganism. The control sequences that are suitable for prokaryotes, forexample, include a promoter, optionally an operator sequence, and aribosome binding site. Eukaryotic cells are known to utilize promoters,polyadenylation signals, and enhancers.

[0286] Nucleic acid is “operably linked” when it is placed into afunctional relationship with another nucleic acid sequence. For example,DNA for a presequence or secretory leader is operably linked to DNA fora polypeptide if it is expressed as a preprotein that participates inthe secretion of the polypeptide; a promoter or enhancer is operablylinked to a coding sequence if it affects the transcription of thesequence; or a ribosome binding site is operably linked to a codingsequence if it is positioned so as to facilitate translation. Generally,“operably linked” means that the DNA sequences being linked arecontiguous, and, in the case of a secretory leader, contiguous and inreading phase. However, enhancers do not have to be contiguous. Linkingis accomplished by ligation at convenient restriction sites. If suchsites do not exist, the synthetic oligonucleotide adaptors or linkersare used in accordance with conventional practice. The transcriptionaland translational regulatory nucleic acid will generally be appropriateto the host cell used to express the library protein, as will beappreciated by those in the art; for example, transcriptional andtranslational regulatory nucleic acid sequences from Bacillus arepreferably used to express the library protein in Bacillus. Numeroustypes of appropriate expression vectors, and suitable regulatorysequences are known in the art for a variety of host cells.

[0287] In general, the transcriptional and translational regulatorysequences may include, but are not limited to, promoter sequences,ribosomal binding sites, transcriptional start and stop sequences,translational start and stop sequences, and enhancer or activatorsequences. In a preferred embodiment, the regulatory sequences include apromoter and transcriptional start and stop sequences.

[0288] Promoter sequences include constitutive and inducible promotersequences. The promoters may be either naturally occurring promoters,hybrid or synthetic promoters. Hybrid promoters, which combine elementsof more than one promoter, are also known in the art, and are useful inthe present invention.

[0289] In addition, the expression vector may comprise additionalelements. For example, the expression vector may have two replicationsystems, thus allowing it to be maintained in two organisms, for examplein mammalian or insect cells for expression and in a prokaryotic hostfor cloning and amplification. Furthermore, for integrating expressionvectors, the expression vector contains at least one sequence homologousto the host cell genome, and preferably two homologous sequences whichflank the expression construct. The integrating vector may be directedto a specific locus in the host cell by selecting the appropriatehomologous sequence for inclusion in the vector. Constructs forintegrating vectors and appropriate selection and screening protocolsare well known in the art and are described in e.g., Mansour et al.,Cell, 51:503 (1988) and Murray, Gene Transfer and Expression Protocols,Methods in Molecular Biology, Vol. 7 (Clifton: Humana Press, 1991).

[0290] In addition, in a preferred embodiment, the expression vectorcontains a selection gene to allow the selection of transformed hostcells containing the expression vector, and particularly in the case ofmammalian cells, ensures the stability of the vector, since cells whichdo not contain the vector will generally die. Selection genes are wellknown in the art and will vary with the host cell used. By “selectiongene” herein is meant any gene which encodes a gene product that confersresistance to a selection agent. Suitable selection agents include, butare not limited to, neomycin (or its analog G418), blasticidin S,histinidol D, bleomycin, puromycin, hygromycin B, and other drugs.

[0291] In a preferred embodiment, the expression vector contains a RNAsplicing sequence upstream or downstream of the gene to be expressed inorder to increase the level of gene expression. See Barret et al.,Nucleic Acids Res. 1991; Groos et al., Mol. Cell. Biol. 1987; andBudiman et al., Mol. Cell. Biol. 1988.

[0292] A preferred expression vector system is a retroviral vectorsystem such as is generally described in Mann et al., Cell, 33:153-9(1993); Pear et al., Proc. Natl. Acad. Sci. U.S.A., 90(18):8392-6(1993); Kitamura et al., Proc. Natl. Acad. Sci. U.S.A., 92:9146-50(1995); Kinsella et al., Human Gene Therapy, 7:1405-13; Hofmann et al.,Proc. Natl. Acad. Sci. U.S.A., 93:5185-90; Choate et al., Human GeneTherapy, 7:2247 (1996); PCT/US97/01019 and PCT/US97/01048, andreferences cited therein, all of which are hereby expressly incorporatedby reference.

[0293] The candidate variant library proteins of the present inventionare produced by culturing a host cell transformed with nucleic acid,preferably an expression vector, containing nucleic acid encoding anlibrary protein, under the appropriate conditions to induce or causeexpression of the library protein. The conditions appropriate forcandidate variant library protein expression will vary with the choiceof the expression vector and the host cell, and will be easilyascertained by one skilled in the art through routine experimentation.For example, the use of constitutive promoters in the expression vectorwill require optimizing the growth and proliferation of the host cell,while the use of an inducible promoter requires the appropriate growthconditions for induction. In addition, in some embodiments, the timingof the harvest is important. For example, the baculoviral systems usedin insect cell expression are lytic viruses, and thus harvest timeselection can be crucial for product yield.

[0294] As will be appreciated by those in the art, the type of cellsused in the present invention can vary widely. Basically, a wide varietyof appropriate host cells can be used, including yeast, bacteria,archaebacteria, fungi, and insect, plant, and animal cells, includingmammalian cells. Of particular interest are Drosophila melanogastercells, Saccharomyces cerevisiae and other yeasts, E. coli, Bacillussubtilis, SF9 cells, C129 cells, 293 cells, Neurospora, BHK, CHO, COS,and HeLa cells, fibroblasts, Schwanoma cell lines, immortalizedmammalian myeloid and lymphoid cell lines, Jurkat cells, mast cells andother endocrine and exocrine cells, and neuronal cells. See the ATCCcell line catalog, hereby expressly incorporated by reference. Inaddition, the expression of the secondary libraries in phage displaysystems, such as are well known in the art, are particularly preferred,especially when the secondary library comprises random peptides. In oneembodiment, the cells may be genetically engineered, that is, containexogeneous nucleic acid, for example, to contain target molecules.

[0295] In a preferred embodiment, the candidate variant protein orcandidate variant library proteins are expressed in mammalian cells. Anymammalian cells may be used, with mouse, rat, primate and human cellsbeing particularly preferred, although as will be appreciated by thosein the art, modifications of the system by pseudotyping allows alleukaryotic cells to be used, preferably higher eukaryotes. As is morefully described below, a screen will be set up such that the cellsexhibit a selectable phenotype in the presence of a random librarymember. As is more fully described below, cell types implicated in awide variety of disease conditions are particularly useful, so long as asuitable screen may be designed to allow the selection of cells thatexhibit an altered phenotype as a consequence of the presence of alibrary member within the cell.

[0296] Accordingly, suitable mammalian cell types include, but are notlimited to, tumor cells of all types (particularly melanoma, myeloidleukemia, carcinomas of the lung, breast, ovaries, colon, kidney,prostate, pancreas and testes), cardiomyocytes, endothelial cells,epithelial cells, lymphocytes (T-cell and B cell), mast cells,eosinophils, vascular intimal cells, hepatocytes, leukocytes includingmononuclear leukocytes, stem cells such as haemopoetic, neural, skin,lung, kidney, liver and myocyte stem cells (for use in screening fordifferentiation and de-differentiation factors), osteoclasts,chondrocytes and other connective tissue cells, keratinocytes,melanocytes, liver cells, kidney cells, and adipocytes. Suitable cellsalso include known research cells, including, but not limited to, JurkatT cells, NIH3T3 cells, CHO, Cos, etc. See the ATCC cell line catalog,hereby expressly incorporated by reference.

[0297] Mammalian expression systems are also known in the art, andinclude retroviral systems. A mammalian promoter is any DNA sequencecapable of binding mammalian RNA polymerase and initiating thedownstream (3′) transcription of a coding sequence for library proteininto mRNA. A promoter will have a transcription initiating region, whichis usually placed proximal to the 5′ end of the coding sequence, and aTATA box, using a located 25-30 base pairs upstream of the transcriptioninitiation site. The TATA box is thought to direct RNA polymerase II tobegin RNA synthesis at the correct site. A mammalian promoter will alsocontain an upstream promoter element (enhancer element), typicallylocated within 100 to 200 base pairs upstream of the TATA box. Anupstream promoter element determines the rate at which transcription isinitiated and can act in either orientation. Of particular use asmammalian promoters are the promoters from mammalian viral genes, sincethe viral genes are often highly expressed and have a broad host range.Examples include the SV40 early promoter, mouse mammary tumor virus LTRpromoter, adenovirus major late promoter, herpes simplex virus promoter,and the CMV promoter.

[0298] Typically, transcription termination and polyadenylationsequences recognized by mammalian cells are regulatory regions located3′ to the translation stop codon and thus, together with the promoterelements, flank the coding sequence. The 3′ terminus of the mature mRNAis formed by site-specific post-translational cleavage andpolyadenylation. Examples of transcription terminator and polyadenlytionsignals include those derived form SV40.

[0299] The methods of introducing exogenous nucleic acid into mammalianhosts, as well as other hosts, is well known in the art, and will varywith the host cell used. Techniques include dextran-mediatedtransfection, calcium phosphate precipitation, polybrene mediatedtransfection, protoplast fusion, electroporation, viral infection,encapsulation of the polynucleotide(s) in liposomes, and directmicroinjection of the DNA into nuclei.

[0300] In a preferred embodiment, candidate variant proteins orcandidate variant library proteins are expressed in bacterial systems.Bacterial expression systems are well known in the art.

[0301] A suitable bacterial promoter is any nucleic acid sequencecapable of binding bacterial RNA polymerase and initiating thedownstream (3′) transcription of the coding sequence of library proteininto mRNA. A bacterial promoter has a transcription initiation regionwhich is usually placed proximal to the 5′ end of the coding sequence.This transcription initiation region typically includes an RNApolymerase binding site and a transcription initiation site. Sequencesencoding metabolic pathway enzymes provide particularly useful promotersequences. Examples include promoter sequences derived from sugarmetabolizing enzymes, such as galactose, lactose and maltose, andsequences derived from biosynthetic enzymes such as tryptophan.Promoters from bacteriophage may also be used and are known in the art.In addition, synthetic promoters and hybrid promoters are also useful;for example, the tac promoter is a hybrid of the trp and lac promotersequences. Furthermore, a bacterial promoter can include naturallyoccurring promoters of non-bacterial origin that have the ability tobind bacterial RNA polymerase and initiate transcription.

[0302] In addition to a functioning promoter sequence, an efficientribosome binding site is desirable. In E. coli, the ribosome bindingsite is called the Shine-Delgarno (SD) sequence and includes aninitiation codon and a sequence 3-9 nucleotides in length located 3-11nucleotides upstream of the initiation codon.

[0303] The expression vector may also include a signal peptide sequencethat provides for secretion of the library protein in bacteria. Thesignal sequence typically encodes a signal peptide comprised ofhydrophobic amino acids which direct the secretion of the protein fromthe cell, as is well known in the art. The protein is either secretedinto the growth media (gram-positive bacteria) or into the periplasmicspace, located between the inner and outer membrane of the cell(gram-negative bacteria).

[0304] The bacterial expression vector may also include a selectablemarker gene to allow for the selection of bacterial strains that havebeen transformed. Suitable selection genes include genes which renderthe bacteria resistant to drugs such as ampicillin, chloramphenicol,erythromycin, kanamycin, neomycin and tetracycline. Selectable markersalso include biosynthetic genes, such as those in the histidine,tryptophan and leucine biosynthetic pathways.

[0305] These components are assembled into expression vectors.Expression vectors for bacteria are well known in the art, and includevectors for Bacillus subtilis, E. coli, Streptococcus cremoris, andStreptococcus lividans, among others.

[0306] The bacterial expression vectors are transformed into bacterialhost cells using techniques well known in the art, such as calciumchloride treatment, electroporation, and others.

[0307] In one embodiment, candidate variant protein are produced ininsect cells. Expression vectors for the transformation of insect cells,and in particular, baculovirus-based expression vectors, are well knownin the art and are described e.g., in O'Reilly et al., BaculovirusExpression Vectors: A Laboratory Manual (New York: Oxford UniversityPress, 1994).

[0308] In a preferred embodiment, candidate variant protein is producedin yeast cells. Yeast expression systems are well known in the art, andinclude expression vectors for Saccharomyces cerevisiae, Candidaalbicans and C. maltosa, Hansenula polymorpha, Kluyveromyces fragilisand K. lactis, Pichia guillerimondii and P. pastoris,Schizosaccharomyces pombe, and Yarrowia lipolytica. Preferred promotersequences for expression in yeast include the inducible GAL1, 10promoter, the promoters from alcohol dehydrogenase, enolase,glucokinase, glucose-6-phosphate isomerase,glyceraldehyde-3-phosphate-dehydrogenase, hexokinase,phosphofructokinase, 3-phosphoglycerate mutase, pyruvate kinase, and theacid phosphatase gene. Yeast selectable markers include ADE2, HIS4,LEU2, TRP1, and ALG7, which confers resistance to tunicamycin; theneomycin phosphotransferase gene, which confers resistance to G418; andthe CUP1 gene, which allows yeast to grow in the presence of copperions.

[0309] In a preferred embodiment, the candidate variant protein orcandidate variant library proteins are expressed in plant cells. Genesequences intended for expression in transgenic plants are firstassembled in expression cassettes adjacent to a suitable promoterexpressible in plants. The expression cassettes may also include anyfurther sequences required or selected for the expression of thetransgene. Such sequences include, but are not restricted to,transcription terminators, extraneous sequences to enhance expressionsuch as introns, enhancer sequences, and sequences intended for thetargeting of the gene product to specific organelles and cellcompartments. These expression cassettes can then be easily transferredto the plant transformation vectors described below. The following is adescription of various components of typical expression cassettes.

[0310] The selection of the promoter used in expression cassettesdetermines the spatial and temporal expression pattern of the transgenein the transgenic plant. Selected promoters express transgenes inspecific cell types (such as leaf epidermal cells, mesophyll cells, rootcortex cells) or in specific tissues or organs (roots, leaves orflowers, for example) and the selection of a promoter is therefore basedon the desired location of accumulation of the gene product. In apreferred embodiment of the invention, a seed-specific promoter is usedfor expression of an oleosin-TR fusion protein, an oleosin-TR fusionprotein or an oleosin-hybrid TR/TR-reductase fusion protein. In a mostpreferred embodiment, the seed specific promoter is a phaseolinpromoter.

[0311] Promoters vary in their ability to promote transcription.Depending upon the host cell system utilized, any one of a number ofsuitable promoters known in the art can be used. For constitutiveexpression, the CaMV 35S promoter, the rice actin promoter, or theubiquitin promoter may be used. Alternatively, an inducible promoter maybe selected to drive expression of the gene under various inducingconditions. For chemically inducible expression, the inducible PR-1promoter from tobacco or Arabidopsis may be used (see, e.g., U.S. Pat.No. 5,689,044).

[0312] A variety of transcriptional terminators are available for use innuclear gene expression cassettes, and are responsible for thetermination of transcription beyond the transgene and its correctpolyadenylation. Appropriate transcriptional terminators are those thatare known to function in plants and include the CaMV 35S terminator, thetm/terminator, the nopaline synthase (nos) terminator and the pea rbcSE9 terminator. These can be used in both monocotyledonous anddicotyledonous plants. In a preferred embodiment, a phaseolintranscriptional terminator is used. Expression in plastids may notrequire termination, but may require correct 5′ and 3′ signals fortranslational initiation, elongation and RNA stability.

[0313] Numerous sequences have been found to enhance gene expressionfrom within the transcriptional unit and these sequences can be used inconjunction with the genes of this invention to increase theirexpression in transgenic plants. For example, various intron sequencessuch as introns of the maize Adhl gene have been shown to enhanceexpression, particularly in monocotyledonous cells. In addition, anumber of non-translated leader sequences derived from viruses are alsoknown to enhance expression, and these are particularly effective indicotyledonous cells.

[0314] For their expression in transgenic plants, the coding sequence ofDNA molecules used may require modification and optimization,particularly when the DNA molecules are of prokaryotic origin. It isknown in the art that all organisms have specific preferences for codonusage, and the codons in the nucleotide sequence of the DNA molecules ofthe present invention can be changed to conform with specific plantpreferences, while maintaining the amino acids encoded thereby. Highexpression in plants is best achieved from coding sequences which haveat least 35% GC content, and preferably more than 45%. Nucleotidesequences which have low GC contents may express poorly due to theexistence of ATTTA motifs which may destabilize messages, and AATAAAmotifs which may cause inappropriate polyadenylation. Although preferredgene sequences may be adequately expressed in both monocotyledonous anddicotyledonous plant species, sequences can be modified to account forthe specific codon preferences and GC content preferences ofmonocotyledons or dicotyledons as these preferences have been shown todiffer (Murray et al. (1989) Nucl Acids Res 17: 477-498). In addition,the nucleotide sequences are screened for the existence of illegitimatesplice sites which cause message truncation. All changes required to bemade within the nucleotide sequences such as those described above aremade using well known techniques of site directed mutagenesis, PCR, andsynthetic gene construction using, for example, the methods described inthe published patent applications EP 0 385 962, EP 0 359 472, and WO93/07278, the entire disclosures of which are hereby incorporated intheir entireties.

[0315] For efficient initiation of translation, sequences adjacent tothe initiating methionine may require modification. For example, theycan be modified by the inclusion of sequences known to be effective inplants. Joshi has suggested an appropriate consensus for plants (NucAcids Res (1987) 15:6643-6653) and a further consensus translationinitiator (Clontech 1993/1994 catalog, page 210) may be included. Theseconsensus sequences are suitable for use with the nucleotide sequencesof this invention. The sequences are incorporated into constructionsincluding the nucleotide sequence, up to and including the ATG (whilstleaving the second amino acid unmodified), or alternatively up to andincluding the GTC subsequent to the ATG (with the possibility ofmodifying the second amino acid of the transgene).

[0316] Various mechanisms for targeting gene products are known to existin plants, and the sequences controlling the functioning of thesemechanisms have been characterized in some detail. For example, thetargeting of gene products to the chloroplast is controlled by a transitsequence found at the amino terminal end of various proteins which iscleaved during chloroplast import to yield the mature protein (Comai etal. (1988) J Biol Chem 263: 15104-15109). Other gene products arelocalized to other organelles such as the mitochondrion and theperoxisome (Unger et al. (1989) Plant Mol Biol 13:411-418). The cDNAsencoding these products can be manipulated to target heterologous geneproducts to these organelles. In addition, sequences have beencharacterized which cause the targeting of gene products to other cellcompartments.

[0317] Amino terminal sequences are responsible for targeting to the ER,the apoplast, and extracellular secretion from aleurone cells (Koehler &Ho (1990) Plant Cell 2:769-783). Additionally, amino terminal sequencesin conjunction with carboxy terminal sequences are responsible forvacuolar targeting of gene products (Shinshi et al., (1990) Plant MolBiol 14:357-368). By the fusion of the appropriate targeting sequencesdescribed above to transgene sequences of interest it is possible todirect the transgene product to the desired organelle or cellcompartment.

[0318] In another preferred embodiment, the DNA molecules of thisinvention are directly transformed into the plastid genome. Plastidtransformation technology is described extensively in U.S. Pat. Nos.5,451,513, 5,545,817, 5,545,818 and 5,576,198; in PCT application nos.WO 95/16783 and WO 97/32977; and in McBride et. al., Proc Natl Acad SciUSA 91: 7301-7305 (1994), the entire disclosures of all of which arehereby incorporated by reference. In one embodiment, plastidtransformation is achieved via biolistics, first carried out in theunicellular green alga Chlamydomonas reinhardtii (Boynton et al. (1988)Science 240:1534-1537)) and then extended to Nicotiana tabacum (Svab etal. (1990) Proc Natl Acad Sci USA 87:8526-8530), combined with selectionfor cis-acting antibiotic resistance loci (spectinomycin or streptomycinresistance) or complementation of non-photosynthetic mutant phenotypes.

[0319] In other embodiment, tobacco plastid transformation is carriedout by particle bombardment of leaf or callus tissue, or polyethyleneglycol (PEG)-mediated uptake of plasmid DNA by protoplasts, using clonedplastid DNA flanking a selectable antibiotic resistance marker. The 1 to1.5 kb flanking regions, termed targeting sequences, facilitatehomologous recombination with the plastid genome and allow thereplacement or modification of specific regions of the 156 kb tobaccoplastid genome. Initially, point mutations in the plastid 16S rDNA andrps12 genes conferring resistance to spectinomycin and/or streptomycinwere utilized as selectable markers for transformation (Svab et al.(1990) Proc Natl Acad Sci USA 87:8526-8530; Staub et al. (1992) PlantCell 4:39-45, the entire disclosures of which are hereby incorporated byreference), resulting in stable homoplasmic transformants at a frequencyof approximately one per 100 bombardments of target leaves. The presenceof cloning sites between these markers allows creation of a plastidtargeting vector for introduction of foreign genes (Staub et al. (1993)EMBO J 12:601-606, the entire disclosure of which is hereby incorporatedby reference): Substantial increases in transformation frequency wereobtained by replacement of the recessive rRNA or r-protein antibioticresistance genes with a dominant selectable marker, the bacterial aadAgene encoding the spectinomycin-detoxifying enzymeaminoglycoside-3′-adenyltransferase (Svab et al. (1993) Proc Natl AcadSci USA 90: 913-917, the entire disclosure of which is herebyincorporated by reference). Previously, this marker had been usedsuccessfully for high-frequency transformation of the plastid genome ofthe green alga Chlamydomonas reinhardtii (Goldschmidt-Clermont, M.(1991) Nucl Acids Res 19, 4083-4089, the entire disclosure of which ishereby incorporated by reference). Recently, plastid transformation ofprotoplasts from tobacco and the moss Physcomitrella has been attainedusing PEG-mediated DNA uptake (O'Neill et al. (1993) Plant J 3:729-738;Koop et al. (1996) Planta 199:193-201, the entire disclosures of whichare hereby incorporated by reference).

[0320] Both particle bombardment and protoplast transformation areappropriate in the context of the present invention. Plastidtransformation of oilseed plants has been successfully carried out inthe genera Arabidopsis and Brassica (Sikdar et al. (1998) Plant Cell Rep18:20-24; PCT Application WO 00/39313, the entire disclosures of whichare hereby incorporated by reference).

[0321] A DNA molecule of the present invention is inserted into aplastid expression cassette including a promoter capable of expressingthe DNA molecule in plant plastids. A preferred promoter capable ofexpression in a plant plastid is, for example, a promoter isolated fromthe 5′ flanking region upstream of the coding region of a plastid gene,which may come from the same or a different species, and the nativeproduct of which is typically found in a majority of plastid typesincluding those present in non-green tissues. Gene expression inplastids differs from nuclear gene expression and is related to geneexpression in prokaryotes (Stern et al. (1997) Trends in Plant Sci2:308-315, the entire disclosure of which is hereby incorporated byreference).

[0322] Plastid promoters generally contain the −35 and −10 elementstypical of prokaryotic promoters, and some plastid promoters called PEP(plastid-encoded RNA polymerase) promoters are recognized by an E.coli-like RNA polymerase mostly encoded in the plastid genome, whileother plastid promoters called NEP promoters are recognized by anuclear-encoded RNA polymerase. Both types of plastid promoters aresuitable for the present invention. Examples of plastid promotersinclude promoters of clpP genes such as the tobacco clpP gene promoter(WO 97/06250, the entire disclosure of which is hereby incorporated byreference) and the Arabidopsis clpP gene promoter (U.S. application Ser.No. 09/038,878, the entire disclosure of which is hereby incorporated byreference). Another promoter capable of driving expression of a DNAmolecule in plant plastids comes from the regulatory region of theplastid 16S ribosomal RNA operon (Harris et al., (1994) Microbiol Rev58:700-754; Shinozaki et al. (1986) EMBO J 5:2043-2049, the entiredisclosures of both of which are hereby incorporated by reference).Other examples of promoters capable of driving expression of a DNAmolecule in plant plastids include a psbA promoter or am rbcL promoter.A plastid expression cassette preferably further includes a plastid gene3′ untranslated sequence (3′ UTR) operatively linked to a DNA moleculeof the present invention. The role of untranslated sequences ispreferably to direct the 3′ processing of the transcribed RNA ratherthan termination of transcription. Preferably, the 3′ UTR is a plastidrps16 gene 3′ untranslated sequence, or the Arabidopsis plastid psbAgene 3′ untranslated sequence. In a further preferred embodiment, aplastid expression cassette includes a poly-G tract instead of a 3′untranslated sequence. A plastid expression cassette also preferablyfurther includes a 5′ untranslated sequence (5′ UTR) functional in plantplastids, operatively linked to a DNA molecule of the present invention.

[0323] A plastid expression cassette is included in a plastidtransformation vector, which preferably further includes flankingregions for integration into the plastid genome by homologousrecombination. The plastid transformation vector may optionally includeat least one plastid origin of replication. The present invention alsoencompasses a plant plastid transformed with such a plastidtransformation vector, wherein the DNA molecule is expressible in theplant plastid. The invention also encompasses a plant or plant cell,including the progeny thereof, including this plant plastid. In apreferred embodiment, the plant or plant cell, including the progenythereof, is homoplasmic for transgenic plastids.

[0324] Other promoters capable of driving expression of a DNA moleculein plant plastids include transactivator-regulated promoters, preferablyheterologous with respect to the plant or to the subcellular organelleor component of the plant cell in which expression is effected. In thesecases, the DNA molecule encoding the transactivator is inserted into anappropriate nuclear expression cassette which is transformed into theplant nuclear DNA. The transactivator is targeted to plastids using aplastid transit peptide. The transactivator and thetransactivator-driven DNA molecule are brought together either bycrossing a selected plastid-transformed line with and a transgenic linecontaining a DNA molecule encoding the transactivator supplemented witha plastid-targeting sequence and operably linked to a nuclear promoter,or by directly transforming a plastid transformation vector containingthe desired DNA molecule into a transgenic line containing a DNAmolecule encoding the transactivator supplemented with aplastid-targeting sequence operably linked to a nuclear promoter. If thenuclear promoter is an inducible promoter, in particular a chemicallyinducible promoter, expression of the DNA molecule in the plastids ofplants is activated by foliar application of a chemical inducer. Such aninducible transactivator-mediated plastid expression system ispreferably tightly regulatable, with no detectable expression prior toinduction and exceptionally high expression and accumulation of proteinfollowing induction. A preferred transactivator is, for example, viralRNA polymerase. Preferred promoters of this type are promotersrecognized by a single sub-unit RNA polymerase, such as the T7 gene 10promoter, which is recognized by the bacteriophage T7 DNA-dependent RNApolymerase. The gene encoding the T7 polymerase is preferablytransformed into the nuclear genome and the T7 polymerase is targeted tothe plastids using a plastid transit peptide. Promoters suitable fornuclear expression of a gene, for example a gene encoding a viral RNApolymerase such as the T7 polymerase, are described above and elsewherein this application. Expression of DNA molecules in plastids can beconstitutive or can be inducible, and such plastid expression can bealso organ- or tissue-specific. Examples of various expression systemsare extensively described in WO 98/11235, the entire disclosure of whichis hereby incorporated by reference. Thus, in one aspect, the presentinvention utilized coupled expression in the nuclear genome of achloroplast-targeted phage T7 RNA polymerase under the control of thechemically inducible PR-1a promoter, for example of the PR-1 promoter oftobacco, operably linked with a chloroplast reporter transgene regulatedby T7 gene 10 promoter/terminator sequences, for example as described inas in U.S. Pat. No. 5,614,395 the entire disclosure of which is herebyincorporated by reference. In another embodiment, when plastidtransformants homoplasmic for the maternally inherited TR genes arepollinated by lines expressing the T7 polymerase in the nucleus, F1plants are obtained that carry both transgene constructs but do notexpress them until synthesis of large amounts of enzymatically activeprotein in the plastids is triggered by foliar application of the PR-1ainducer compound benzo(1,2,3)thiadiazole-7-carbothioic acid S-methylester (BTH).

[0325] In a preferred embodiment, two or more genes, for example TRgenes, are transcribed from the plastid genome from a single promoter inan operon-like polycistronic gene. In a preferred embodiment, theoperon-like polycistronic gene includes an intervening DNA sequencebetween two genes in the operon-like polycistronic gene. In a preferredembodiment, the DNA sequence is not present in the plastid genome toavoid homologous recombination with plastid sequences. In anotherpreferred embodiment, the DNA sequence is derived from the 5′untranslated (UTR) region of a non-eukaryotic gene, preferably from aviral 5′ UTR, preferably from a 5′ UTR derived from a bacterial phage,such as a T7, T3 or SP6 phage. In a preferred embodiment, a portion ofthe DNA sequence may be modified to prevent the formation of RNAsecondary structures in an RNA transcript of the operon-likepolycistronic gene, for example between the DNA sequence and the RBS ofthe downstream gene. Such secondary structures may inhibit or repressthe expression of the downstream gene, particularly the initiation oftranslation. Such RNA secondary structures are predicted by determiningtheir melting temperatures using computer models and programs such a the“mfold” program version 3 (available from Zuker and Turner, WashingtonUniversity School of Medicine, St-Louis, Mo.) and other methods known toone skilled in the art.

[0326] The presence of the intervening DNA sequence in the operon-likepolycistronic gene increases the accessibility of the RBS of thedownstream gene, thus resulting in higher rates of expression. Suchstrategy is applicable to any two or more genes to be transcribed fromthe plastid genome from a single promoter in an operon-like chimericgene.

[0327] Numerous transformation vectors available for planttransformation are known to those of ordinary skill in the art, and thegenes pertinent to this invention can be used in conjunction with anysuch vectors. Vector selection will depend upon the preferredtransformation technique and the target species being transformed. Forcertain target species, different antibiotic or herbicide selectionmarkers may be preferred.

[0328] Selection markers used routinely in transformation include thenptII gene, which confers resistance to kanamycin and relatedantibiotics (Messing & Vieirra. (1982) Gene 19:259-268; Bevan et al.(1983) Nature 304:184-187), the bar gene, which confers resistance tothe herbicide phosphinothricin (White et al. (1990) Nucl Acids Res 18:1062; Spencer et al. (1990) Theor Appl Genet 79:625-631), the hph gene,which confers resistance to the antibiotic hygromycin (Yanofsky, et al.(1992) Gene 117:161-167), the dhfr gene, which confers resistance tomethotrexate (Bourouis et al., EMBO J. 7:1099-1104 (1983)), the EPSPSgene, which confers resistance to glyphosate (U.S. Pat. Nos. 4,940,935and 5,188,642), and the mannose phosphate isomerase gene pmi whichconfers tolerance to normally phytotoxic sugar mannose (Negrotto, et al.(2000) Plant Cell Rep 19:798-803).

[0329] Many vectors are suitable for transformation using Agrobacteriumtumefaciens. These typically carry at least one T-DNA border sequenceand include vectors such as pBIN 19 (Bevan, (1984) Nucl Acids Res) andpXYZ. Typical vectors suitable for Agrobacterium transformation includethe binary vectors pCIB200 and pCIB2001, as well as the binary vectorpCIB1 0 and hygromycin selection derivatives thereof. (U.S. Pat. No.5,639,949).

[0330] Transformation without the use of Agrobacterium tumefacienscircumvents the requirement for T-DNA sequences in the chosentransformation vector. Consequently, vectors lacking these sequences canbe used as an alternative to vectors such as the T-DNA-containingvectors described above.

[0331] Transformation techniques that do not rely on Agrobacteriuminclude transformation via particle bombardment, protoplast uptake forexample PEG and/or electroporation, and microinjection. The choice ofvector depends largely on the preferred selection for the species beingtransformed. Typical vectors suitable for non-Agrobacteriumtransformation include pCIB3064, pSOG1 9, and pSOG35. (U.S. Pat. No.5,639,949).

[0332] Once the coding sequence of interest has been cloned into anexpression system, it is transformed into a plant cell. Methods fortransformation and regeneration of plants are well known in the art. Forexample, Ti plasmid vectors have been utilized for the delivery offoreign DNA, as well as direct uptake of DNA, liposomes,electroporation, microinjection, and microprojectiles. In addition,bacteria from the genus Agrobacterium can be utilized to transform plantcells.

[0333] Transformation techniques for dicotyledons are well known in theart and include Agrobacterium-based techniques and techniques that donot require Agrobacterium. Non-Agrobacterium techniques involve theuptake of exogenous genetic material directly by protoplasts or cells.This can be accomplished by PEG or electroporation mediated uptake,particle bombardment-mediated delivery, or microinjection. In each casethe transformed cells are regenerated to whole plants using standardtechniques known in the art.

[0334] Methods for transformation of many dicot and monocot species arewell-known in the art. Preferred techniques include direct gene transferinto protoplasts using PEG or electroporation techniques, particlebombardment into callus tissue, as well as Agrobacterium-mediatedtransformation.

[0335] In addition, the candidate variant library protein may also bemade as a fusion protein, using techniques well known in the art. Forexample, the variant protein may be fused to other proteins to increaseexpression or stabilize the protein. Similarly, other fusion partnersmay be used, such as antibodies, targeting sequences that allowlocalization of the library members into a subcellular or extracellularcompartment of the cell, rescue sequences or purification tags, thatallow the purification or isolation of either the library protein or thenucleic acids encoding them; stability sequences, which confer stabilityor protection from degradation, fusion proteins including reporter,detection and selection genes or proteins, or combinations of these, aswell as linker sequences as needed.

[0336] In a preferred embodiment, the candidate variant proteins orcandidate variant library proteins are purified or isolated afterexpression. Variant proteins may be isolated or purified in a variety ofways known to those skilled in the art depending on what othercomponents are present in the sample. Standard purification methodsinclude electrophoretic, molecular, immunological and chromatographictechniques, including ion exchange, hydrophobic, affinity, andreverse-phase HPLC chromatography, and chromatofocusing. Ultrafiltrationand diafiltration techniques, in conjunction with protein concentrationare also useful. For general guidance in suitable purificationtechniques, see Scopes, R., Protein Purification, Springer-Verlan, NY(1982). The degree of purification necessary will vary depending on theuse of the variant protein. In some instances, no purification will benecessary.

[0337] Once made, the variant TR proteins may be experimentally testedand validated in in vivo and in vitro assays. Suitable assays includeprimary and secondary screening assays and characterization of purifiedprotein kinetic parameters, i.e., K_(cat) and K_(m) (See FIGS. 11 and12).

[0338] Once made, the variant TR proteins and nucleic acids of theinvention find use in a number of applications. In a preferredembodiment, the variant TRs are used to reduce the antigenicity ofglutens in wheat, rye and barley.

[0339] In other embodiments, the variant TRs are used to reduce thedisulfide bonds in toxic proteins, such as those found in snake venom,bees, scorpions and the bacterial neurotoxins tetanus and botulinum.

[0340] In a preferred embodiment, the variant TRs are used to reducealternative substrates. Alternative useful substrates for thioredoxinreductases include a number of plant and mammalian proteins found tocontain thioredoxin domains. For example, protein disulfide isomerase(PDI) contains two regions that exhibit internal sequence homology tothioredoxin. PDI is a substrate for thioredoxin reductase. Proteindisulfide isomerases have been identified from mammalian sources, suchas bovine (Yamauchi et al., Biochem. Biophys. Res. Commun.146:1485-1492, 1987), chicken (Parkkonen et al., Biochem. Zn256:1005-1011, 1988), human (Rapilajaniemi et al. EMBO J. 6:643-6491987), mouse (Gong, et al., Nucleic Acids Res. 16:1203, 1988), rabbit(Fliegel et al., J. Biol. Chem. 265:15496-15502, 1990), and rat (Edmanet al., Nature 317:267-270, 1985). PDI has been isolated from yeast(Tachikawa et al., J. Biochem. 110:306-313). Suitable PDIs can be foundin WO9501425 published 19950112 and WO9500636 published 19950105, aswell as other PDIs known in the art including human and plant forms.

[0341] Compositions and uses of redox agents that are substrates ofthioredoxin reductase, such as thioredoxin and PDI, are known in theart, and are discussed herein. Disulfide linkages are present in manytypes of proteins such as enzymes, structural proteins, etc. Enzymes arecatalytic proteins such as proteases, amylases, etc., while structuralproteins can be scleroproteins such as keratin, etc. Protein material inhair, wool, skin, leather, hides, food, fodder, is stains, and humantissue contains disulfide linkages. Treatment of some of these materialswith PDI and thioredoxin, and a redox partner have been describedpreviously. By way of example, the use of thioredoxin for waving,straightening, removing and softening of human and animal hair isdescribed EP 183506 and WO8906122. U.S. Pat. No. 4,771,036 alsodescribes the use of thioredoxin for prevention and reversal ofcataracts. Use of thioredoxin to prevent metal catalysed oxidativedamage in biological reactions is described by Pigiet et al. in EP237189. EP 272781 and EP 276547 describe the use of PDI forreconfiguration of human hair, and for treatment of wool, respectively.The uses of such enzymes have all been connected with reduction ofprotein disulfide linkages to free protein sulhydryl groups and/or therearrangement of disulfide linkages in the same or between differentpolypeptides. Consequently, thioredoxin reductases of the invention canbe added to such compositions as a redox partner, optionally with itscofactor NADH or NADPH, to regenerate the redox agent and thus enhancethe compositions' usefulness. In an alternative embodiment, thethioredoxin variant of the invention are provided as protein fusionswith the redox agent as taught herein For example, the compositions canbe used for the treatment or degradation of scleroproteins, especiallyhair, skin and wool, dehairing and softening of hides, treatment andcleaning of fabrics, as additives to detergents, thickening and gelationof food and fodder, strengthening of gluten in bakery or pastryproducts, and as pharmaceuticals for the alleviation of eye sufferings.The compositions of the invention, particularly with PDI, can be usedwith other protein containing materials to generate intermolecularprotein disulfide cross-links yielding high molecular weight or gelledcompositions. Thus the present invention can be used in the field offood processing such as of raw fish meat paste, kamaboko (fish cake),fish/livestock meat sausage, tofu (soy bean curd), noodles,confectionery, bread, dough, food adhesives, sheet-like meat food,yogurt, jelly and cheese. In addition, they can also be used as novelprotein-derived materials in a wide range of industries includingcosmetics, raw materials of microcapsules and carriers of immobilizedenzymes.

[0342] In a preferred embodiment, variant TR-oleosin-thioredoxin andoleosin-variant thioredoxin-reductase fusion proteins accumulate inassociation with the oil bodies. In an alternate embodiment,oleosin-thioredoxin/variant thioredoxin-reductase hybrid fusion proteinsaccumulate in association with the oil bodies. The oil bodies can befractionated to achieve partial purification of the fusion proteins.Purified oil bodies, with the associated fusion proteins, can be used asingredients for testing of thioredoxin and thioredoxin-reductaseactivity and functional benefits in dermal (cosmetics) or food useapplications. Oil bodies have very suitable processing and formulationcharacteristics for cosmetic and food ingredients. Therefore, deliveryof thioredoxin and/or thioredoxin-reductase as oleosin fusionsassociated with oil bodies simplifies processing and increases productstability.

[0343] In an alternate embodiment, a second purification step can beperformed to purify thioredoxin or thioredoxin-reductase from the oilbodies. This leads to a highly purified preparation of the proteins thatcan be used as an ingredient for testing the activity of thioredoxin andthioredoxin-reductase, and for providing functional benefits incosmetics or food uses. See also U.S. patent Publication No.2002/0037303; incorporated herein by reference.

[0344] In addition to other formulations and composition embodimentsdiscussed herein, e.g, oil body embodiments, the compositions of theinvention can contain soluble thioredoxin reductases and/or redoxagents, and other ingredients known in the art as e.g. excipients,stabilizers, fillers, detergents, etc. The compositions can beformulated in any convenient form, e.g. as a powder, paste, liquid or ingranular form. The enzyme(s) may be stabilized in a liquid by inclusionof enzyme stabilizers. Usually, the pH of a solution of the compositionwill be 5-10 and in some instances 7.0-8.5. Often a sterile compositionis preferred depending on the use.

[0345] Additionally, grain and grain-derived product performance inlivestock feed are also affected by inter- and intramolecular disulphidebonding. Grain digestibility, nutrient availability, and theneutralization of anti-nutritive factors (e.g., protease, arnylaseinhibitors etc.) would be increased by reducing the extent of disulphidebonding (see WO 00/36126, filed Dec. 15, 1999). Expression of transgenicthioredoxin reductase variants, optionally with thioredoxin, in corn andsoybeans and the use of thioredoxin reductase in grain processing, e.g.,wet milling, provides an alternative method for reducing the disulfidebonds in seed proteins during or prior to industrial processing. Theinvention therefore provides grains with altered storage protein qualityas well as grains that perform qualitatively differently from normalgrain during industrial processing or animal digestion (both referred tosubsequently as “processing”). This method of delivery of thioredoxinreductase, optionally with thioredoxin, eliminates the need to developexogenous sources of thioredoxin and/or thioredoxin reductase foraddition during processing. A second advantage to supplying thioredoxinand/or thioredoxin reductase via the grains is that physical disruptionof seed integrity is not necessary to bring the enzyme in contact withthe storage or matrix proteins of the seed prior to processing or as anextra processing step. The invention described herein is applicable toall grain crops, in particular corn, soybean, wheat, and barley, mostparticularly corn and soybean, especially corn. Expression of transgenicthioredoxin reductase, optionally with thioredoxin, in grain is a meansof altering the quality of the material (seeds) going into grainprocessing, altering the quality of the material derived from grainprocessing, maximizing yields of specific seed components duringprocessing (increasing efficiency), changing processing methods, andcreating new uses for seed-derived fractions or components from millingstreams. The invention thus provides a plant which expresses athioredoxin reductase variant, optionally with thioredoxin, preferablyunder control of an inducible promoter, for example either operativelylinked to the inducible promoter or under control oftransactivator-regulated promoter wherein the correspondingtransactivator is under control of the inducible promoter or isexpressed in a second plant such that the promoter is activated byhybridization with the second plant; wherein the TR is preferablythermostable or a eukaryotic reductase; such plant also including seedtherefor, which seed is optionally treated (e.g., primed or coated)and/or packaged, e.g. placed in a bag with instructions for use, andseed harvested therefrom, e.g., for use in a milling process asdescribed above. The transgenic plant of the invention may optionallyfurther comprise genes for enhanced production of NADPH or NADH.

[0346] The invention further provides a method for producing starchand/or protein comprising extracting starch or protein from seedharvested from a plant as described above; and a method for wet millingcomprising steeping seed from a thioredoxin reductase-expressing plantas described above and extracting starch and/or protein therefrom. Heatstable enyzmes are preferred, such as from a thermophilic organism,e.g., from an archea, for example from Methanococcus jannaschii orArchaeglobusfulgidus, e.g., as described herein.

[0347] Expression of transgenic thioredoxin reductase variants,optionally with thioredoxin, in grain is also useful to improve graincharacteristics associated with digestibility, particularly in animalfeeds. Susceptibility of feed proteins to proteases is a function oftime and of protein conformation. Kernel cracking is often used in feedformulation as is steam flaking. Both of these processes are designed toaid kernel digestibility. Softer kernels whose integrity can bedisrupted more easily in animal stomachs are desirable. Conformationalconstraints and crosslinks between proteins are major determinants ofprotease susceptibility. Modifying these bonds by increased thioredoxinand/or thioredoxin reductase expression thereby aids digestion. Proteincontent and quality are important determinants in flaking gritproduction and in masa production. Reduction of disulphide bonds altersthe nature of corn flour such that it is suitable for use as a wheatsubstitute, especially flours made from high-protein white cornvarieties. Over half of the US soybean crop is crushed or milled, andthe protein quality in the resulting low-fat soy flour or de-fatted soyflour (or soybean meal) is important for subsequent processing. Proteinyield and quality from soybean processing streams are economicallyimportant, and are largely dependent upon protein conformation.Increasing thioredoxin activity through expression of transgenicthioredoxin and/or thioredoxin reductase increases protein solubility,and thus increases yield, in the water-soluble protein fractions.Recovery is facilitated by aqueous extraction of de-fatted soybean mealunder basic conditions. Enhancing thioredoxin activity throughexpression of transgenic thioredoxin and/or thioredoxin reductase alsoreduces the required pH for efficient extraction and thereby reducescalcium or sodium hydroxide inputs, as well as lowering the acid inputfor subsequent acid precipitation, allowing efficient recovery ofproteins without alkali damage, and reducing water consumption andprocessing plant waste effluents (that contain substantial biologicaloxygen demand loads). Protein redox status affects important functionalproperties supplied by soy proteins, such as solubility, waterabsorption, viscosity, cohesion/adhesion, gelation and elasticity. Fiberremoval during soy protein concentrate production and soy proteinisolate hydrolysis by proteases is enhanced by increasing thioredoxinactivity as described herein. Similarly, as described for corn above,increasing thioredoxin activity through expression of transgenicthioredoxin and/or thioredoxin reductase enhances the functionality ofenzyme-active soy flours and the digestibility of the soybean mealfraction and steam flaking fraction in animal feeds. Modification ofprotein quality during seed development and during processing are bothprovided, although it is preferred that the transgenic thioredoxinand/or thioredoxin reductase be targeted to a cell compartment and bethermostable, as described above, to avoid significant adverse effectson storage protein accumulation possibly encountered as a result ofthioredoxmi activity during seed development. Alternately, thethioredoxin reductase variant, and optionally thioredoxin, can be addedas a processing enzyme, (or as fusions as taught herein) as (in contrastto corn wet milling) breaking the disulphide bonds is not necessaryuntil after grain integrity is destroyed (crushing and oil extraction).Protein disulfide isomerase (PDI) are also useful as described above forthioredoixn. Regarding use of oil bodies with TR, incorporated herein byreference is US20020037303 entitled “Thioredoxin and thioredoxinreductase containing oil body based products” published Mar. 28, 2002.

[0348] Additional uses of the enzymes of the invention for seed and gaincan be found in WO0058453, published Oct. 5, 2000. Thioredoxin reductasevariants can be expressed optionally with thioredoxin, or addedexogenously, for the uses described therein for seed and grain qualityenhancment. The transgenic plant of of interest include is barley,wheat, Arabidopsis, tobacco, rice, Brassica, Picea, or soy bean, maize,oat, rye, sorghum, millet, triticale, and forage and turf grass. Atransgenic plant of the invention can have reduced allergenicity incomparison to the same part of a non-transgenic plant of the samespecies. The allergenicity can be hypersensitivity, wherein saidhypersensitivity is reduced by at least 5%. Further, a transgenic plantof the invention can have increased digestibility in comparison to thesame part of a non-transgenic plant of the same species. Thedigestibility is increased by at least 5 percent. A transgenic plant canhave at least part of said plant having an earlier onset and/or anincreased expression of a gibberellic acid inducible enzyme incomparison to the same part of a non-transgenic plant of the samespecies. Preferably the enzyme is pullulanase, alpha-amylase. The partsof the plant are preferably edible parts, more preferably grain or seed.Preferred promoters are a seed or grain maturation-specific promoter,e.g., selected from the group consisting of rice glutelins, riceoryzins, rice prolamines, barley hordeins, wheat gliadins, wheatglutelins, maize zeins, maize glutelins, oat glutelins, sorghumkasirins, millet pennisetins, rye secalins, and a maize embryo-specificglobulin. In other embodiments are a food, feed or beverage product madefrom the transgenic seed or grain of the invention. The food, feed, orbeverage can be flour, dough, bread, pasta, cookies, cake, thickener,beer, malted beverage, or a food additive. The food, feed, or beerproduct of can have reduced allergenicity and/or increaseddigestibility. Further, a dough product can have increased strength andvolume in comparison to a dough made from a non-transgenic seed or grainof the same species. The food, feed, or beverage can havehyperdigestible protein and/or hyperdigestible starch. The food, feed,or beverage can be hypoallergenic. The above embodiments are alsoachieved by exogenous addition of the enzymes of the invention, as woulde known in the art. It has been shown that reduction of disulfideprotein allergens in wheat and milk by thioredoxin decreases theirallergenicity. Thioredoxin treatment also increases the digestibility ofthe major allergen of milk (beta-lactoglobulin), as well as otherdisulfide proteins. A more detailed discussion of the benefits of addingexogenous thioredoxin to food products is presented in U.S. Pat. No.5,792,506, which is specifically incorporated herein by reference. Thecompositions and methods can be enhanced using the TR variants of theinvention.

[0349] As discussed herein, the proteins of the invention can be used toreduce allergenicity of proteins in food and feed. For example, see U.S.Pat. No. 6,190,723 and reference therein, which is specificallyincorporated herein by reference, for uses of thioredoxin withthioredoxin reductase and NADPH as exogenously added treatments. Skintests and feeding experiments carried out with sensitized dogs showedthat treatment of their food prior to ingestion eliminated or decreasedthe allergenicity of the food.

[0350] Consequently, provided herein are compositions for and methods ofdecreasing the allergenicity of an allergenic food or feed protein. Thefood or feed protein or food or fed containing the protein or proteinsis contacted with an amount of thioredoxin, thioredoxin reductase, andcofactor, namely NADPH, NADH or combination thereof, effective fordecreasing the allergenicity of the protein. This can be followed byadministering the contacted protein to an animal or human, wherein theallergenic symptoms exhibited by the animal or huamn are decreased ascompared to a control. The allergenic food/feed protein is preferablyfrom the beef, cow's milk, egg, soy, rice and wheat proteins. Alsoembodied are ingestible food/feed products containing thioredoxin and TRvariant and further containing cofactor. The enzymes made be exogenouslyadded, or one or the other may be transgenically or naturally present,singly or as a fusion. The ingestible food is preferably hypoallergenicbecause of the treatment. The food product can be a pet food or babyfood or formula. The food product can contain beef, egg, soy, wheat ormilk protein. It can be an ingestible meat food product. U.S. Pat. No.5,792,506 is and its references are incorporated by reference.

[0351] Similarly, in U.S. Pat. No. 6,114,504 compositions and methods ofreducing cystine containing animal and plant proteins, and improvingdough and baked goods' characteristics is provided which includes thesteps of mixing dough ingredients with a thiol redox protein to form adough and baking the dough to form a baked good. The method of thepresent invention preferably uses reduced thioredoxin with wheat flourwhich imparts a stronger dough and higher loaf volumes. The methods andcompositons are enhanced using the proteins of the invention. A methodof reducing a glutenin or gliadin protein is by adding thioredoxin to aliquid or substance containing said glutenin or gliadin protein;reducing the thioredoxin by means of thioredoxin reductase variant and acofactor, namely NADPH, NADH or combination thereof, and reducing theglutenin or gliadin protein by the reduced thioredoxin. A compositioncontains a glutenin or gliadin protein, added or endogenous thioredoxin,added or endogenous (as from a transgenic plant) thioredoxin reductasevariant, and added cofactor, namely NADPH, NADH or combination thereof.The method is useful to reduce any water insoluble or soulble,seed-derived protein comprising. One can add thioredoxin to a liquid orsubstance containing said protein; reducing the thioredoxin by means ofthioredoxin reductase variant and its cofactor, namely NADPH, NADH orcombination thereof.

[0352] The invention is also useful for increasing hyperdigestibilty offood and feed proteins. See U.S. Pat. No. 5,952,034 that provides forcompostions and methods to increase the digestibility of food proteinsby thioredoxin reduction. The mehods are enhanced by use fo the enzymesof the invention. Compsotions and method of increasing the digestibilityof a food comprise treating a food with an amount of thioredoxin,thioredoxin reductase variant, and its cofactor, namely NADPH, NADH orcombinatio thereof, effective for increasing the digestibility of thefood; and optionally administering the treated food to an animal orhuman thereby increasing the digestibility of the food as measured bythe symptoms exhibited by said animal or human as compared to a control.The food preferably contains milk or wheat or eggs. In the aboveembodiments, the thioredxoin reductase variant can be provided as aprotein fusion with thioredoxin.

[0353] The compositions of the invention also find additional uses.Thioredoxin and other redox agents, such as PDI, are known to be usefulin protection against stress and injury. Accordingly, the compositons ofthe invention can be usd to enhance redox agent compositins for suchtreatment. In one embodiment, TR variants are used to manipulatenitrosative stress to upregulate nitrosative stress defenses. See U.S.Pat. No.6,359,004. Thioredoxin can act as a radical scavenger, thusdisease and conditions related to free radicals can be treated with TRvariants, preferably in combination with thioredoxin. Thus, in oneaspect, the present invention provides compositions and methods for theprevention or treatment of eye diseases, such as cataracts. In anotheraspect, the present invention relates to the prevention or treatment ofdiseases caused by oxidative stress or having oxidative stress as acomponent. See for example U.S. Pat. No. 6,379,664. In one embodiment isprovided compositions and methods of inhibiting or reversing theformation of a cataract in an eye, by contacting the eye with aneffective cataract-inhibiting amount of a composition of the invention,containing TR variant, preferably in combination with thioredoxin. Inanother embodiment, intraocular injection of thioredoxin in combinationof a TR variant and cofactor suppresses retinal photooxidative stress,and as a therapeutic strategy to prevent retinal photic injury. Inanother embodiment, compostions of the invention containing thioredoxinactivity are useful to treat or minimize oxidative stress andischemia-reperfusion induced in acute lung injury. And consequentlyfurther finds use in lung transplantation, particulary in patients withend-stage lung diseases, such as cystic fibrosis, emphysema, pulmonaryfibrosis, and pulmonary hypertension. The compositions of the inventionfind use as storage compositions to maintain integrity of organs fortransplant. In another embodiment, thioredoxin in combination with theTR variants promotes the in vitro survival of primary cultured neurons.Further the compositions will provide a neuroprotective effect in thepenumbra to modify neuronal damage during focal brain ischemia. Thecompositions will also provide protection and improvement ofmotorneurons from or after nerve injury. In another embodiment,compositions of the invention protect the retina fromischemia-reperfusion injury. Burn injuries can also be treated withcompositons of the invention. Thioredoxin and TR variants provide arapid antioxidant defense, improves coagulation processes, cell growth,and control of the extracellular peroxide tone intimately linked tocytoprotection and wound healing in burns. Finally, the compositions ofthe invention provide thiol-antioxidants that are good candidates forcontrolling Epstein-Barr virus (EBV) infection.

[0354] TR variants can provide direct benefit by removing deleteriousascorbyl free radical and dehydroascorbate, which are reduced toascorbic acid by thioredoxin reductase. Thus TR provides a directantioxidant effect and treatment. The compositions can optionallycontain cofactors.

[0355] In the diseases and conditions described herein, the TR variantscan be supplied alone or in combination with thioredoxin or other redoxagents and cofactors. The enzymes by be separate or fused. The TRvariant may act with host redox agents or redox agnet can be exogenouslyadded.

[0356] The following examples serve to more fully describe the manner ofusing the above-described invention, as well as to set forth the bestmodes contemplated for carrying out various aspects of the invention. Itis understood that these examples in no way serve to limit the truescope of this invention, but rather are presented for illustrativepurposes. All references cited herein, including U.S. Ser. No.60/289,029, filed May 4, 2001, U.S. Ser. No. 60/370,609, filed Apr. 5,2002, and the provisional application by Desjarlais and Muchhal,entitled “Novel Nucleic Acids and Proteins with Thioredoxin ReductaseActivity”, filed Apr. 29, 2002, serial number not assigned, areincorporated by reference.

EXAMPLES Example 1

[0357] Computational Design of Variant Proteins

[0358] Overview

[0359] The initial PDA™ design strategy for creating variants withimproved NADH-dependent TR activity is detailed below. In short, thestructural information from both E coli and Arabidopsis enzymes, and theco-factor conformation diversity was used to design two differentlibraries (referred to as TR-1 and TR-2 henceforth), each with ˜2000combinatorial members.

[0360] Wilditype TR genes used as scaffold proteins:

[0361] 1) Arbidopsis NTR1 gene cloned in pET29a . The encoded proteinhas an N-terminal S-tag. The protein may be expressed using BL21-S1cells (salt induced) or BL21-Star (IPTG induced), lysed using BugBusterHT.

[0362] 2) Thioredoxin j. A codon-optimized gene synthesized and clonedin pDEST-14, expressed in BL21-S1-Star. Solube fraction used assubstrade during primary screenings. N- and/or C-terminal His taggedversions made. The C-terminal His-tagged TRx purified by affinitychromatograph for use in kinetic determinations.

[0363] Assay: Kinetic assay based on continuous detection of formationreducted product of DTNB at 412 nm.

[0364] A more detailed overview of the screening strategy used foridentification and kinetic characterization of “hits” is described inFIG. 4.

[0365] Purified proteins were used for all the kinetic characterizationsand second and third tier screenings. High throughput procedures forgenerating required amounts of purified proteins were eitherindependently developed or adapted from existing commercial protocols. Asnapshot of these methods is presented in FIG. 5. The detailed protocolsused for high-throughput culture, induction, expression, proteinpurification and enzymatic characterization are described below.

[0366] The kinetic parameters (Km and Kcat) for the purified WT NTR-1enzyme (unmodified) with respect to both the NADH and NADPH substratesto define the benchmark for PDA™ designed variants. The WT enzyme has ˜4fold higher Kcat (equivalent to the Vmax using 1 ug of TR protein) forthe native (NADPH) co-factor than NADH. Also the Km is ˜50 fold higherfor NADH compared to NADPH. The data for WT enzyme is presented in FIG.6.

[0367] The TR Libraries were constructed using standard molecularbiology procedures of site-directed mutagenesis and recursive PCR.Combinatorial pieces representing specifically mutated gene segmentswere joined together using specific restriction enzymes. The quality ofthese libraries was evaluated from sequence and expression analysis ofrandomly picked clones. These details for the TR-1 and TR-2 arepresented in FIGS. 7 and 8 respectively. In addition to thesecombinatorial libraries, individual C-region combinations for each ofthese two libraries (24 for TR-1 and 48 for TR-2) were synthesized in WTbackbone to evaluate the effect of this critical region identified byPDA™, these clones are henceforth referred to as “defined clones” alongwith the individual members of TR-3 and TR4 (see below).

[0368] A computationally relevant description of the two libraries ispresented in FIGS. 9A and B. The designed positions (orange) and thedocked co-factor (blue or yellow) with appropriate conformation areidentified.

[0369] In addition to these two libraries, a couple of very smalllibraries were generated to explore additional strategies. TR-3 had 18members and was designed as a fine tuning approach based on results forthe best clone from TR-2 screening. TR4 had 16 members and was based onsequence alignment of TR and AhpF sequence. AhpF codes for a NADHdependent peroxiredoxin reductase, an activity analogous to TR.

[0370] The summary of results from the screening of these 4 libraries ispresented in FIG. 10.

[0371] The screening of TR-1 library did not identify any clones withsignificantly improved TR activity with NADH as a co-factor, compared toWT NTR-1. This likely the result of using the “incorrect” co-factorconformation.

[0372] The TR-2 library had several clones with significantly improvedNADH-dependent activities. Two of the best variants with differentC-regions sequences were “RYN” and “RFN”. Mutations in other designedpositions did not have a significant effect on the overall properties ofthe TR enzymes. The following slides present detailed kinetic data formany of these variants.

[0373] M-RYN, L-RYN and WT kinetic parameters and their activities atdifferent co-factor concentrations are described FIGS. 11A and Brespectively. Both of these variants have significantly higherNADH-dependent activities compared to WT. In addition they havesignificantly reduced NADPH dependent activity. This is termed“Co-factor Switch”. At co-factor concentrations of 2.5 mM and above bothof these PDA™ designed NTRs have >50% of WT NADPH activity with NADH asco-factor.

[0374] The sequence alignment of these clones and their relativecomputational ranking from the design perspective is shown in FIG. 17A.

[0375] The presence of N in RYN and RFN clones created a potentialglycosylation site. This site was “designed out” using PDA™ withoutaffecting the activity profile of these clones significantly. The dataand strategy for this is described below.

[0376] Computational representation of the critical RRR to RYN change isdescribed in FIG. 18.

[0377] In addition to RYN and RFN combinations in the C-region, REN,RLN, RRN combinations also had significantly improved NADH-dependentactivity. The RRN variant also maintained its WT level of NADPHdependent activity. This data is summarized in FIG. 12. Additionally,RRT, RYT, RLR, KYN, MYN, QYN C-region variants also showed improvedNADH-dependent activity.

[0378] The results from screening of these libraries point strongly tothe significance of three RRR residues in the C-region for determiningthe co-factor specificity profile. To address the significance of allpossible combinations of 20 amino acids at each of these positions, ahigh complexity random RRR library was designed and screened to identifythe best variants for their activity with NADH. An oligonucleotide withNNK degeneracy at each of the three R positions was used to constructthis library with a theoretical combinatorial potential of 32768members.

[0379] After screening only a small proportion of this library, thesequence and activity analysis of the best clones indicated that a R toW mutation at the first R postion had the most interesting activityprofile. This is also substantiated from the bioinformatics analysis ofmost naturally occurring NAD(P)H dependent enzymes sequences suggestingthe presence of an aromatic amino acid. This led us to design a PDA™library where the first R is forced to be an aromatic amino acid duringPDA™ simulations. This led to the design of two additional smaller PDA™libraries called R1-W and WXX. The computational strategy for theirdesign is described below.

[0380] The best hits from all these new library designs were analyzed(using purified enzymes) for their relative activity at 0.6 and 1.2 mMeach of the two co-factors. Their Km and Kcats were also determined andthe data is presented in FIGS. 13A and B respectively.

[0381] These clones have “highly improved” NADH dependent TR activities.In addition to their improved NADH activity, some of the variants alsohave improved NADPH dependent activities. This in essence representscreating TR variants with better catalytic efficiencies for both theco-factors. This is also reflected in the several fold higher NADH Kcatvalues for all the variants. The Km for NADH remained unchanged for mostof the improved variants, except WRT which has a two fold reduced Km forthis co-factor. The members of this list coming from either R1-W and WXXlibraries are indicated in FIG. 13C. A computational model of the twobest clones from R1-W library are depicted in FIG. 14 for a structuralperspective on their activity.

[0382] The PDA™ Design process for TR has thus identified:

[0383] Five or more variants with equal to or better than 50% of WTNADPH activity, with NADH at 1.2 mM.

[0384] At least one variant meets this activity milestone even at 0.6 mMNADH

[0385] A large number of these variant have improved catalyticefficiency for the NADPH activity also.

[0386] The best variant has a 13-fold better Kcat/Km and 2-fold lower Kmfor NADH compared to WT

[0387] Thioredoxin Reductase R1-W Library

[0388] A new set of PDA™ simulations was performed to evaluate the useof an aromatic amino acid (F, Y, or W) at the first position of the trioof residues discovered by Xencor to be extremely important in modulatingactivity levels with NADH and NADPH (corresponding to the position of Rin the RYN variants). The new simulations were motivated by theobservation that a small number of NAD(P)H utilizing enzymes contain anaromatic at this position, and the potential for a stacking interactionbetween the aromatic and the adenine ring on NAD(P)H.

[0389] Simulation of 20¹⁰ (10¹³) sequences resulted in the library shownbelow, which defines 1296 variants for in vitro screening. The 10positions were selected by structural analysis of critical residues forcofactor binding. Analysis of the simulation results revealed thatsampling amino acid diversity at 6 of the 10 positions would result in ahigh-quality library of modest size.

[0390] The 4^(th) PDA™ library, with diversity at 6 positions, in thecontext of W versus R at one position, is defined as:

[0391] LIRRRVI (wt)

[0392] LIWRTVI

[0393] AL ASIV

[0394] FV CN

[0395] EC

[0396] K

[0397] L

[0398] M

[0399] Q

[0400] High throughput screening of this library yielded the followinghigh activity WXX clones. These clones have been ranked computationallyby performing PDA™ simulations that represent the 4^(th) PDA™combinatorial library.

[0401] Out of the 1296 possible sequences in this library the highlyactive WXX clones rank computationally as follows:

[0402] LIWRTVI 13/1296 (rank/library size)

[0403] LIWLSVI 51/1296

[0404] LIWMSVI 26/1296

[0405] LIWRSVI 46/1296

[0406] Note that these rankings are not intended to be predictive ofrelative activity: the calculation was designed to define the broadestset of structurally compatible cofactor binding pocket diversity in thesmallest number of sequences. All of the library members are in the top0.001% of the 20⁶ theoretically possible sequence combinations at the 6positions included in the 4^(th) library, demonstrating a focusingeffect of over 10⁴. This furthermore constitutes a focusing effect of atleast 10⁹ relative to the 20¹⁰ sequence combinations included in theoriginal simulation.

[0407] Note also that these rankings are based purely on simulatedinteraction with NADH. They do not take into account the specificity ofthe enzyme for or against NADPH. Since the project objectives did notinclude NADPH/NADH specificity, comparative modeling of the twocofactor-protein complexes was not performed.

[0408] Additional Variants

[0409] Based on the success of the R1-W library, and the observation ofconsiderable diversity at the 2^(nd) and 3^(rd) R positions in both thesimulations and laboratory screening, Xencor constructed a smallcomplexity (400) library to sample all possible WXX combinations. Highthroughput screening of this library led to the discovery of severaladditional variants with high activity using NADH, and variable activityusing NADPH.

[0410] The 5 best clones from this library, containing diversity only atthe 3 RRR positions, are listed below. While the design of this librarywas directly influenced by all of the previous PDA™ simulation andexperimental results, the library was not based on a PDA™ simulation perse. Thus there are no computational rankings for these variants.

[0411] WIS

[0412] WFQ

[0413] WVR

[0414] WMG

[0415] WVG

[0416] Computational Rankings of RYN Thioredoxin Reductase Variants

[0417] The individual “RYN” clones have been ranked computationally byperforming PDA™ simulations that represent the 2^(nd) PDA™ combinatoriallibrary constructed and screened by Xencor. Simulation of 20⁸ (2.5×10¹⁰)sequences resulted in the library below, which defines 2304 variants forin vitro screening. The 8 positions were selected by structural analysisof critical residues for cofactor binding.

[0418] The 2² PDA™ library, with diversity at 8 positions is defined as:

[0419] LIGDRRRS

[0420] QMSNKYTD

[0421] L QEN

[0422] LI

[0423] Out of the 2304 possible sequences in this library the wild-typeand highly active RYN clones rank as follows:

[0424] LIGDRRRS (wt) 329

[0425] LIGDRYNS 339

[0426] LLGDRYNS 698

[0427] LMGDRYNS 920

[0428] Note that the rankings are not intended to be predictive ofrelative activity: the calculation was designed to define the broadestset of structurally compatible cofactor binding pocket diversity in thesmallest number of sequences. All of the library members are in the top0.00001% of the 20⁸ theoretically possible sequence combinations at theeight positions included in the 2^(nd) library, demonstrating a focusingeffect of over 10⁷.

[0429] Note also that these rankings are based purely on simulatedinteraction with NADH. They do not take into account the specificity ofthe enzyme for or against NADPH. Since the project objectives did notinclude NADPH/NADH specificity, comparative modeling of the twocofactor-protein complexes was not performed.

[0430] Novel Thioredoxin Reductase Variants

[0431] Low Complexity Library. The initial success of the RYN variantmotivated Xencor to pursue further optimization of this variant byrefining the amino acids in the RYN variant, leading to the very small18-member library shown below.

[0432] RRR

[0433] MYN

[0434] FD

[0435] Screening of this library revealed that the RFN combination wasof similar activity to the RYN variants discovered previously. Accordingto PDA™ simulations, this clone ranks 7^(th) in this library (RYN ranks3^(rd)).

[0436] Non-glycosylation variants. Because of the inadvertentintroduction of a potential N-linked glycosylation site (consensusN-X-[T/S]) in the RYN and related variants (RYDAFNASKIMQQ), PDA™simulations were performed to assess the feasibility of extinguishingthe potential site by substitution of the Serine (S) two positionsdownstream of the Asn (N) in the RYN variants. The simulations indicatethat several amino acid substitutions would be favorable, including Serto Ala, which Xencor then produced and characterized experimentally. Inthis one-position simulation (NAX), Ala ranked 6^(th), with Thr and Serranked 1^(st) and 2^(nd), respectively. Experimental data indicates thatthe Ala substitution has no detectable effect on the activity of the RYNvariants.

[0437] RYN-A (339/2304,6/20) (rank/original library size, rank/NAXlibrary size)

[0438] RFN-A (7/18,6/20)

[0439] Computational Strategy

[0440] Primary Goal: Conversion of arabidopsis thioredoxin reductaseactivity such that it efficiently utilizes NADH vs. NADPH

[0441] Basic Outline of Strategy:

[0442] I. generate starting model

[0443] use E coli structure (1TDF) to “graft” coordinates of NADPcofactor into coordinate frame of arabidopsis structure (1VDC), whichdoes not include cofactor coordinates.

[0444] II. define working cofactor conformation

[0445] a. direct derivation by deleting P from NADP

[0446] b. indirect derivation by superposition of NAD coordinates fromvarious NAD-utilizing enzymes

[0447] III. run PDA simulation(s) to generate combinatorial librarypossibilities.

[0448] a. define libray positions

[0449] b. run simulation(s)

[0450] c. generate library

[0451] Detailed Outline of Strategy

[0452] I. Generation of starting model

[0453] A. The 1VDC structure file was processed to create a morereasonable numbering system for the structure (the original versioncontained an atypical numbering format so that the numbering agreed withthe E coli structure).

[0454] B. Structure alignment for grafting NADP coordinates from 1TDF to1VDC An alignment was obtained using the C-alphas from the followingresidues: 117, 119, 151-156, 174-181, and 242-244. This gives an RMSD of0.48 A for 19 matched atoms (with a maximum deviation of 0.89 A).

[0455] C. Note that no minimization was done on the final model.

[0456] II: Defining the working cofactor conformations

[0457] A. The initial cofactor conformation was defined simply bydeleting the phosphate group from the NADP cofactor contained within the1TDF file. We will refer to this conformation as NAD_TDF.

[0458] B. Alternative NAD conformations. Adam Thomason developed Perlscripts that scan the PDB for structures containing NAD cofactors. Thescripts then perform a full or partial superposition of the NAD from theextracted PDB file onto the reference NAD_TDF. A large number of NADconformations were thus collected (see FIG. 19) and ready for use in PDAsimulations.

[0459] Simulations have been performed using either the NAD_TDFconformer or the NAD_GRB conformer (from 1GRB-human glutathionereductase), which had the lowest all-atom r.m.s.d to NAD_TDF. Visualinspection of over 100 NAD conformers indicates that the ribose puckerfound in NAD_GRB is significantly more prevalent than that in NAD_TDF,suggesting that this conformer is of lower energy. It is possible thatthe rare conformer seen in NAD_TDF stems from the fact that thisconformer was derived from NADP coordinates.

[0460] C. Hydroxyl rotamer states. The orientation of the hydrogen of ahydroxyl group can have a significant influence on side chain-cofactorinteractions, particularly with respect to hydrogen bondinginteractions. For library 1, a static pair of hydroxyl rotamers wasutilized, because only a single ligand state can be included persimulation within the Xencor implementation of PDA™. Subsequently, theSPA package was developed such that a combinatorial set of ligand statescan be included in the simulation. A support program named “makeligands”(from makeligands.f90) was also developed to generate combinatorial setsof hydroxyl rotamer orientations.

[0461] III. PDA simulation(s) to generate combinatorial libraries

[0462] A Defining library positions

[0463] The current strategy is to enhance interactions between the TRRprotein and the adenine portion of NADH, particularly with the diolgroup on the adenine ribose, which is left behind when the phosphate isremoved (see FIG. 20).

[0464] B. Library 1 Calculations—performed with PDA™

[0465] The first combinatorial library was generated using the PDA™simulation package. In this package, ligands are incorporated as part ofthe “template”, which restricts the number of ligand states persimulation to 1. Therefore, the hydroxyl rotamers on the adenine diolwere arbitrary for this set of calculations. Furthermore, no chargeswere created for the NAD. The first set of calculations included severalamino acid possibilities at position 189. For all subsequentcalculations, the identity at this position was restricted to Histidine.

[0466] C. Library 1 definition

[0467] The rationale for library 1 was based on a combination of (i)quality of residues as predicted by ORBIT (based on probability tablesgenerated by an ORBIT monte carlo simulation); (ii) structuralintuition; and (iii) an emphasis on sampling a diversity of amino acidproperties. At all positions, the wild type residue was included in thelibrary. The most intriguing aspects of the library are variouspotential hydrogen-bonding interactions between side chains and thecofactor, giving rise to residues EDT at position 127, QE at position195, EQ at position 217, and E at position 255. Because mostNADH-utilizing enzymes contain an interaction between a carboxylic sidechain and the adenine diol, the prediction of Q and E at position 195 isencouraging.

[0468] TRR Library 1:

[0469] 127 LEDTA

[0470] 165 IML

[0471] 166 G

[0472] 167 G

[0473] 189 H

[0474] 190 RYM

[0475] 191 RQ

[0476] 195 RYQE

[0477] 217 SEQ

[0478] 255 IE

[0479] D. Library 2 calculations—performed with SPA

[0480] Several simulations, using various cofactor conformations andsampling strategies, were performed for the development of library 2.

[0481] (i) The first set of simulations was performed using the NAD_TDFcofactor conformation for the heavy atom coordinates. Using thisconformation, and 36 (6×6) hydroxyl rotamer combinations on the adeninediol, simulations were performed with either backbone ensemble orsub-rotamer sampling strategies.

[0482] (ii) The second set of simulations was performed using theNAD_GRB cofactor conformation for the heavy atom coordinates. Using thisconformation, and 36 (6×6) hydroxyl rotamer combinations on the adeninediol, simulations were performed with either backbone ensemble orsub-rotamer sampling strategies.

[0483] E. Library 2 definition

[0484] The rationale for library 2 was based on a combination of (i)quality of residues as predicted by SPA (based on output free energymatrices and comparison of matrices from different simulations); (ii)structural intuition; (iii) an emphasis on sampling a diversity of aminoacid properties; and (iv) feedback from Library 1 screens. At allpositions, the wild type residue was included in the library. As before,the most intriguing aspects of the library are various potentialhydrogen-bonding interactions between side chains and the cofactor.However, because an alternative cofactor conformer was used in thesecalculations, new sets of interactions are predicted by SPA, giving riseto residues Q at position 127, S at position 167, TN at position 195(FIGS. 3A,B), D at position 217, and E at position 255. The S167 (FIG.3C) was chosen despite a high free energy value, based on its predictedability to hydrogen bond to the AO2* oxygen of the adenine diol and thesupposition that a small movement would relieve the van der Waals clash.An additional residue N at position 169 was added to this library, basedon the possibility that neutralizing the negative charge at thisposition would assist in improving binding affinity of the cofactor(note that N is a conservative mutation as it is found in the E coliTRR).

[0485] Most of the residues in library 2 were chosen based onsimulations with NAD_GRB. However, I195 was added based on a highpropensity for this residue in SPA calculations using the NAD_TDFcofactor conformation.

[0486] TRR Library 2: 126 123 118 R 1 127 124 119 L Q 2 128 125 120 S 1164 161 150 V 1 165 162 151 I M L 3 166 163 152 G 1 167 164 153 G S 2168 165 154 G 1 169 166 155 D N 2 170 167 156 S 1 189 186 175 H 1 190187 176 R K Q 3 191 188 177 R Y E L 4 192 189 178 D 1 193 190 179 A 1194 191 180 F 1 195 192 181 R T N I 4 196 193 182 A 1 216 213 202 S 1217 214 203 S D 2 218 215 204 V 1 254 251 242 A 1 255 252 243 I 1 256253 244 G 1 2304

[0487] Assays

[0488] Expression

[0489] 1. The NTR coding region cloned in pET29 is expressed in BL21Star (Invitrogen) cells. The volumes described here are typical forgetting>50 ug of purified protein, and can be either scaled up or downbased on requirements.

[0490] 2. Inoculate colonies in a 96-deep well plate containing 1.5 mlCG+Kanamycin (100 ug/ml), inoculate appropriate controls. Grow overnightcultures at 37° C., 250 rpm

[0491] 3. Next day, inoculate 200 μl of overnight cultures in 5 mlCG+Kanamycin (100 ug/ml) in 4×24-well plate for each 96 deep well plate.Grow at 30° C., 250 rpm, for 3 hrs

[0492] 4. Make glycerol stocks from remaining overnight cultures andfreeze at −80° C.

[0493] 5. Induce the 5 ml cultures with 1M IPTG to final concentrationof 1 mM. Grow overnight at 30° C., 250 rpm

[0494] 6. Next day, spin down the cells at maximum speed (Avanti J-20,5300 rpm) for 10 min. Discard supernatant, pellets can be frozen at −80°C. or proceed to S.tag Purification procedures

[0495] S.Tag Purification For 96-Well Plate

[0496] (96 samples (from cell pellets; Novagen, cat #69232-3)

[0497] The S.Tag Thrombin Purification Kit uses a unique strategy thatemployes Biotinylated Thrombin, which enables simple and specificremoval of the enzyme after digestion with Streptavidin Agarose. Thestandard protocol calls for batch-wise binding to S-protein Agarose,washing, treatment with Biotinylated Thrombin, and capture withStreptavidin Agarose, leaving the purified protein in solution.

[0498] Kit Components Provided Vol for Components Volume 1 kit/24samples S-protein Agarose (50% slurry in 2 ml 167 μl slurry/sample 50 mMTris-HCl, pH 7.5, 150 mM NaCl, 1 mM EDTA, 0.02% sodium azide) 10XBind/Wash Buffer (200 mM 3 × 5 ml 100 ml of 1X Tris-HCl pH 7.5, 1.5 MNaCl, 1 ml/sample 1% Triton X-100) 10X Thrombin Cleavage Buffer 3 ml 30ml of 1X (200 mM Tris-HCl pH 8.4, 1.5 M 400 μl/sample NaCl, 25 mM CaCl₂)Biotinylated Thrombin 50 U (1.5 U/μl) 25 U (16.6 μl) 1 U (0.66μl)/sample Streptavidin Agarose (50% slurry 2 × 0.4 ml 1.6 ml slurry inphosphate buffer, pH 7.5, 0.02% 60 μl slurry/sample sodium azide)

[0499] Additional Materials:

[0500] Whatman Unifilter, 96-well, 800 μl (Fisher, cat #PF7700-2804)

[0501] Bug Buster Protein Extraction Reagent (VWR, cat #80500-208)

[0502] Protocol (5 ml Expression Cultures)

[0503] 1. Thaw frozen pellets (5 ml) at RT for ˜30 min

[0504] 2. Add 500 μl of Bug Buster HT, vortex to resuspend pellets andshake at RT for 20 min

[0505] 3. Spin at max speed or 3000×g for 20 min. Transfer supernatant(cell lysate) containing soluble proteins to a new plate.

[0506] 4. Use 150 μl of cell lysate for purification, save remainder forlater use

[0507] For 150 μl

[0508] Adjust Tris-HCl and NaCl concentration to 20 mM Tris and 150 mMNaCl, pH7.5 150 μl Bug Buster ×100  10 μl 1 M Tris-HCl (final 20 mM)   1ml  15 μl 5 M NaCl (final 0.15 M)  1.5 ml 325 μl H₂O 32.5 ml 500 μltotal aliquot 350 μl mix

[0509] 5. Seal filter plate bottom with aluminum tape

[0510] 6. Add 167 μl of S-protein agarose mix using wide mouth tips

[0511] 7. Add lysate (adjusted) to filter plate, seal plate withaluminum tape

[0512] 8. Bind at RT for 30 min-1 hr on an orbital shaker (Place plateon the side—Do not shake vigorously as this will tend to denatureprotein)

[0513] 9. Remove aluminum tape from the bottom, apply vacuum

[0514] 10. Wash 2 times with 500 μl of 1×Bind/Wash Buffer, apply vacuum

[0515] 11. Equilibrate 2 times with 1×Thrombin Cleavage Buffer with˜1×slurry volume=200 μl, apply very low vacuum

[0516] 12. Re-seal filter plate bottom with aluminum foil

[0517] 13. Make a mix of 1×Thrombin Cleavabe Buffer and BiotinylatedThrombin

[0518] Master Mix 1 Kit for 24 samples Reagents each X100 1X ThrombinCleavage Buffer   80 μl   8 μl Biotinylated Thrombin (1.5 U/μl) 0.66 μl  66 μl Aliquot 80.7 μl

[0519] 14. Gently shake tubes at RT for 1-2 hr on micromixer setting=5,amplitude=4

[0520] 15. Add 60 μl slurry of Streptavidin Agarose

[0521] 16. Incubate on orbital shaker at RT for 10 min

[0522] 17. Remove foil seal from the bottom of the filter plate

[0523] 18. Spin at 500×g, 2 min

[0524] 19. To elute more protein, add 80 μl of 1×cleavage buffer, spinat 500×g, 2 min

[0525] 20. Add equal volume of 50% glycerol, mix really well and storeat 4° C. temporary, for long-term storage, freeze at −80° C.

[0526] BCA Assay

[0527] BCA Protein Assay Reagent Kit (Pierce, Cat #23227)

[0528] 1. Preparation of standards and working reagent

[0529] a. Standards (working range is 0.125-2 μg/μl) Final BCA Tube Volof Diluent (μl) Volume of BSA Concentration (μg/μl) A  0 300 μl stock2.000 B 125 375 μl stock 1.500 C 325 325 μl stock 1.000 D 175 175 μl ofB 0.750 E 325 325 μl of C 0.500 F 325 325 μl of E 0.250 G 325 325 μl ofF 0.125 H 400 100 μl of G 0.025 I 400  0 μl 0.000 = blank

[0530]  For assay: 5 μl of each standard+20 μl of ddH₂O=25 μl total

[0531] b. Working reagents

[0532] Mix 50 ml of Reagent A with 1 ml of Reagent B

[0533] *The Working reagent is stable for several days when stored in aclosed container at room temperature

[0534] 2. Preparation of samples in 96-well plate.

[0535] 5 μl of purified protein (from step 20 of Purification procedure)

[0536] 20 μl of ddH₂O

[0537] Mix well

[0538] 3. Assay procedure

[0539] a. Add 200 μl of Working Reagent to each well containing 25 μl ofstandards and samples

[0540] b. Mix plate thoroughly on a plate shaker for 30 seconds

[0541] c. Cover plate with aluminum foil tape

[0542] d. Incubate at 37° C. for 30 minutes

[0543] e. Cool plate to room temperature

[0544] f. Measure the absorbance at 562 nm on a plate reader

[0545] 4. Use Excel for standard curve plotting and determine proteinconcentration of samples

[0546] 5. Normalize protein concentration for assay

[0547] a. Run a protein gel of normalized protein to confirmconcentration

[0548] b. Stain with SYPRO Orange for 30 min-1 hr (and/or Coomassie blueovernight)

[0549] c. Visualize gel on Apha Innotech Corporation Imager

[0550] d. Perform densitometry using Kodak 1D 3.5 Network software

[0551] Thioredoxin Reductase Assay

[0552] 1. Assay is set up in 384 microtiter plates with 50 μl finalvolume per assay/well: Upto 4×96 well plate into one 384 plate, specificpattern to be noted at time of transfer.

[0553] 2. Transfer 5 μl of normalized protein samples to 384 microtiterplate wells. NADPH or NADH at 1.2 mM (or other appropriateconcentrations), and 2 μM of Purified Thioredoxin substrate is used inassay.

[0554] 3. Prepare assay mix: 1 rxn 300 rxn ddH₂O 35.1 μl 10.53 ml 1 MTris pH 8.0 5.0 μl 1.5 ml 0.5 M EDTA 10 μl 300 μl 20 mM DTNB 0.5 μl 150μl 25 mM NADPH or NADH 2.4 μl 720 μl 100 μM Purified Thioredoxin 1 μl300 μl Total 45 μl 135 ml

[0555] 4. Use Titertek Multidrop 384 to add 45 μl of assay mix

[0556] 5. Immediately place plate in Spectramax plate reader to begindata collection

[0557] 6. For measurement of kinetic parameters (Kcat and Km) thefollowing substrate concentration ranges were generally used:

[0558] NADPH: 0.00, 0.01, 0.02, 0.04, 0.08, 0.15, 0.3, 0.6, 1.2, 2.5,5.0 & 10.0 mM

[0559] NADH: 0.02, 0.04, 0.08, 0.15, 0.3, 0.6, 1.2, 2.5, 5.0, 10.0 &20.0 mM.

[0560] Initial reaction rate in the linear range was determined for eachconcentration. The data was analyzed using GraphPad Prism software tofit a standard Michaelis-Menton equation.

[0561] Preparation of Thioredoxin h (N Terminal His Tag) for Assay Use

[0562] Culture Preparation:

[0563] 1. Inoculate 2 liter expression culture with overnight culture ofThioredoxin-codon opt.e-coli/pET28b in BL21 Star (DE3) expression cells.This yields>100 mgs of purified protein.

[0564] 2. After growth period, induce cells with 1M IPTG for a finalconcentration of 1 mM IPTG. Grow overnight at 30° C., 250 rpm.

[0565] 3. Next day, spin down the 2 L culture into 20 50 ml Falcon tubesand discard the supernatant leaving just the pellet from 100 ml ofculture. Freeze pellets at −80° C. before continuing with supernatantpreparation and His-tag purification.

[0566] Supernatant Preparation:

[0567] 1. Resuspend 20 pellets in 1 ml Bugbuster each and shake at 250rpm, room temperature for 20 min.

[0568] 2. Spin down cells and combine supernatants into a 50 ml Falcontube. Add equal volume of 2×Loading buffer with 2-mercaptoethanol.Proceed with purification.

[0569] His-tag Protein Purification:

[0570] 1. Add 6 μl Clontech TALON Superflow resin suspension to four 50ml Falcon tubes.

[0571] 2. Wash resin with 30 ml of 1×Loading buffer twice

[0572] 3. Bind protein to resin by gently agitating at room temperaturefor 20 min.

[0573] 4. Wash resin in 30 ml of 1×Loading buffer at room temperaturefor 10 min.

[0574] 5. Resuspend resin in 3 ml of 1×Loading buffer.

[0575] 6. Combine suspensions from all four tubes into one Clontech 10ml gravity flow column.

[0576] 7. Wash resin with 15 ml of 1×Loading buffer.

[0577] 8. Resuspend resin in 20 ml of 250 mM imidazole elution buffer.Elute protein into a 50 ml tube twice.

[0578] 9. Continue with imidazole removal by filtration and sampleconcentration or freeze at −20° C. for later use.

[0579] Filtration and Concentration of Purified Thioredoxin:

[0580] 1. Run purified protein sample through Millipore Ultrafree-4Biomax 5K filter tubes.

[0581] 2. Wash samples three times with Filtration Wash buffer.

[0582] 3. Combine concentrated protein samples together. Perform a BCAassay to determine concentration and then dilute to 100 uM with 50%glycerol, 20 mM Tris-HCl pH 8.0.

[0583] 2×Loading Buffer

[0584] 100 mM NaPO4 pH 8.0

[0585] 10 mM Tris, pH 8.0

[0586] 600 mM NaCl

[0587] 20 mM Imidazole

[0588] 10% Ethylene glycol

[0589] For 2×Loading buffer with 2 mM 2-mercaptoethanol, add 0.156 ul/ml

[0590] 250 mM Imidazole Elution Buffer

[0591] 50 mM NaPO4 pH 8.0

[0592] 5 mM Tris, pH 8.0

[0593] 200 mM NaCl

[0594] 250 mM Imidazole

[0595] 10% Ethylene glycol

[0596] Filtration Buffer (for Imidazole Removal)

[0597] 50 mM NaPO4

[0598] 10 mM Tris, pH 8.0

[0599] 200 mM NaCl

[0600] 10% Ethylene glycol

[0601] ddH20

Example 2

[0602] Transformation of Plants with Variant TR Proteins

[0603] Overview

[0604] A gene encoding an oleosin-TR fusion protein, anoleosin-TR-reductase fusion protein or an oleosin-hybridTR-reductase/TR-reductase fusion protein can be incorporated into plantcells using conventional recombinant DNA technology. Generally, thisinvolves inserting a DNA molecule encoding an oleosin-TR-reductasefusion protein, an oleosin-TR-reductase fusion protein or anoleosin-hybrid TR/TR-reductase fusion protein into an expression systemas described above.

[0605] Breeding

[0606] Plants expressing an oleosin-TR fusion protein, anoleosin-TR-reductase fusion protein or an oleosin-hybrid TR/TR-reductasefusion protein, in combination with other characteristics important forproduction and quality, can be incorporated into plant lines throughbreeding approaches and techniques known in the art. Where a plantexpressing an oleosin-TR fusion protein, an oleosin-TR-reductase fusionprotein or an oleosin-hybrid TR/TR-reductase fusion protein is obtained,the transgene is moved into commercial varieties using traditionalbreeding techniques without the need for genetically engineering theallele and transforming it into the plant.

[0607] Plants having the capacity for apomictic reproduction, in whichmaternal tissue gives rise to offspring, can be transformed to expressan oleosin-R fusion protein, an oleosin-TR-reductase fusion protein oran oleosin-hybrid TR/TR-reductase fusion protein, and the introducedalleles can be maintained in desired backgrounds by apomictic breeding.

[0608] Isolation of TR and TR-reductase Genes and in vitro Assays

[0609] In one embodiment, TR genes from Arabidopsis, wheat, a mammaliansource such as calf and E. coli can be isolated and expressed in E. coliusing bacterial expression vectors, and the resulting protein productcan be purified. In another embodiment, TR-reductase genes fromArabidopsis and E. coli can be isolated, expressed in E. coli andpurified. In addition, the TR/TR-reductase gene can be isolated/obtainedfrom Mycobacterium leprae and expressed in E. coli and purified. In apreferred embodiment, M. leprae codons may be altered for optimizationin any given host, such as an E. coli host cell or a plant species.Codon usage tables for many organisms are known and available,permitting codon optimization of coding sequences tailored for aparticular host.

[0610] In another embodiment TR-reductases with altered cofactorspecificity are prepared using targeted mutagenesis or randommutagenesis, and tested for specific mutations at the cofactor bindingsite (Shiraishi, et al. (1998) Arch Biochem Biophys 358 (1): 104-115;Galkin et al. (1997) Protein Eng 10(6): 687-690); Carugo et al. (1997)Proteins 28(1):10-28; Hurley et al. (1996) Biochemistry 35(18):5670-8;and/or by addition of organic solvent (Holmberg et al. (1999) ProteinEng 12 (10): 851-856). Determination of mutations could be assisted bycomputer programs such as the one developed by Mayo and Dahiyat (Chem &Eng News Oct. 6, 1997, pages 9-10). Each of the foregoing references isincorporated herein by reference in its entirety.

[0611] Combinations of different TRs and TR-reductases are used in amatrix to determine which TR and TR-reductase combination is mosteffective in the reduction of wheat storage proteins and milk storageprotein β-lactoglobulin in vitro. Preferably, a combination of TR andTR-reductase are tested These experiments are carried out as describedin Del Val et al. ((1999) Jnl Allerg Clin Immunol 103:690-697). Inbredhigh-IgE-responder atopic dogs are obtained and further prepared bysensitization with commercial extracts of food preparations includingmilk and wheat. Skin tests are performed using the Type Ihypersensitivity reaction. Evans blue dye is injected intravenouslyshortly before skin testing. Aliquots of wheat gruel, whole cow's milkextract and pure β-lactoglobulin are injected intradermally. Skin testsare read blindly by scoring 2 perpendicular diameters of each blue spot.The ability of oleosin-TR, oleosin-TR-reductase and combinations thereofto affect the allergic response is measured in the presence and absenceof exogenous NADPH or NADH.

[0612] Construction of Plant Expression Vectors

[0613] The Arabidopsis TR and TR-reductase gene sequences have beenpublished (Rivera-Madrid et al. (1995) Proc Natl Acad Sci USA92:5620-5624; Jacquot et al. (1994) J Mol Biol 235:1357-1363), and thesegenes can be isolated by PCR.

[0614] In one embodiment, both the Arabidopsis TR and TR-reductase genesare translationally fused to both the N- and C-terminal end of oleosin.This open reading frame is under transcriptional control of appropriatepromoter and terminator sequences for expression in plants. In apreferred embodiment, the phaseolin promoter and terminator sequencesare used to create Arabidopsis TR (ATR) and Arabidopsis TR-reductase(ATRR) constructs.

[0615] Expression in Arabidopsis

[0616] In one embodiment, Arabidopsis is used as a model system for theinitial testing of oleosin-ATR and oleosin-ATRR expression constructs.Seed of Arabidopsis contain oleosin-coated oil bodies very similar tocrop species, especially oilseed crop species, that can be used forcommercial production of TR. Expression of oleosin-TR andoleosin-TR-reductase in Arabidopsis is used to obtain oleosin-TR andoleosin-TR-reductase fusions in oil bodies and to determine whetherthese fusion proteins are biologically active. Both N- and C-terminalfusions of both TR and TR-reductase to oleosin are made and tested. In afurther embodiment, an oleosin fusion to the natural TR/TR-reductasefusion gene from M. leprae is tested. Accumulation of these fusionproteins is quantified using Western blotting, utilizing antibodiesspecific for oleosin and/or TR and TR-reductase. Arabidopsis is usefulfor this purpose since the time required to regenerate and growtransformed Arabidopsis plants and determine transgene expression andaccumulation of expressed products in seeds is much shorter than formost crop species.

[0617] Construction of Plant Expression Vectors

[0618] Plant expression vectors are constructed using other genesencoding TR and TR-reductase including, but not limited to, TR genesfrom wheat, TR genes from a mammalian source such as calf, the TR genefrom E. coli.; the TR-reductase gene from E. coli; and theTR/TR-reductase gene from M. leprae. Either or both of these genes aretranslationally fused to both the N and C-terminal end of oleosin. Theopen reading frame of any such construct is under the transcriptionalcontrol of appropriate promoter and terminator sequences. In a preferredembodiment, the phaseolin promoter and terminator sequences are used toconstruct plant expression vectors which are designated as TR′ andTR-reductase. Even more preferably, the phaseolin promoter andterminator sequences are used to construct plant expression vectorswhich are designated as TR′ and TR-reductase′.

[0619] Expression in Safflower

[0620] Plant transformation vectors as described above are used totransform safflower using methods known to those skilled in the art. Ina preferred embodiment, safflower is transformed by a method adaptedfrom the method disclosed by Baker and Dyer (Plant Cell Rep (1996)16:106-110). Expression is assayed using Northern and Western blotting.The ability of the TR′ and TR-reductase′ constructs to reduce wheatstorage proteins and milk storage protein β-lactoglobulin is tested. Aminimum of 25 independently transformed transgenic safflower plants foreach construct is generated. All the transgenic target crop plants aretested for oleosin-TR′ and oleosin-TR-reductase′ expression. The resultsfrom this analysis indicate which transformation event results in thehighest and/or most optimal TR′ or TR-reductase′ activity. Transgeniclines transformed with this construct are subjected to further analyses.The quantity of TR′ and TR-reductase′ is determined using quantitativeWestern blotting analysis. The specific activity of the oleosin fusionsis compared to the specific activity of the “free” TR′ and TR-reductase′produced in E. coli.

[0621] Plant lines with the highest expression are propagated.Homozygotes and double haploid plants can be produced that possess astable genotype to ensure stable transgene inheritance in subsequentgenerations.

[0622] Preparation of Biotinylated TR

[0623] In one embodiment, TR can be biotinylated in vitro by chemicalmodification of the lysine residues using chemical agents such asbiotinyl-N-hydroxysuccinimide ester. As an alternate embodiment, an invivo, site-specific biotinylation utilizing a biotin-domain peptide fromthe biotin carboxy carrier protein of E. coli acetyl-CoA carboxylase maybe used as described by Smith et al. ((1998) Nuc Acid Res 26:1414-1420).A recombinant thioredoxin capable of being biotinylated in vivo by theE. coli host endogenous biotinylation machinery (BIOTRX) is constructedby inserting an oligonucleotide encoding a 23 amino acid biotinylationrecognition peptide in-frame at the 5′-end of E coli trxA, creating theconstruct pBIOTRX. Cells containing the pBIOTRX plasmid are grown in theabsence of exogenous biotin and the amount and solubility of BIOTRXprotein is determined. Up to 10% of total cellular protein is found tobe BIOTRX protein, while a low amount of tritiated biotin isincorporated into BIOTRX protein and BIOTRX binding to immobilizedavidin or immobilized avidin-alkaline-phosphatase is low. Addition of 10μg/ml biotin to the pre-induction medium of pBIOTRX-transformed cellsresults in an improvement in the overall extent of biotin incorporation.

[0624] Preparation of Biotinylated Oil Bodies-TR Mixtures

[0625] Avidin or strepavidin are used to link the biotinylated TR tobiotinylated oil bodies. Purified biotinylated TR is mixed withbiotinylated oil bodies at different ratios. The efficacy of thesemixtures to reduce allergenicity and improve dough quality in wheat istested as well as the efficacy of these mixtures to reduce allergenicityin milk preparations. The controls include wild type safflower oilbodies and wild type safflower oil bodies mixed, but not linked, withTR.

We claim:
 1. A method for altering the cofactor specificity ofthioredoxin reductase comprising computational mutagenesis.
 2. A methodaccording to claim 1 for altering the cofactor specificity ofthioredoxin reductase comprising: a) inputting a set of coordinates fora thioredoxin reductase (TR) scaffold protein comprising amino acidpositions, b) applying at least one protein design cycle; and c)generating a set of candidate variant proteins with altered cofactordependency.
 3. A method according to claim 2 wherein said TR scaffoldproteins are selected from the group consisting of E. coli, Bacillussubtillis, Mycobacterium leprae, Sarccharomyces, Neurospora crassa,Arabidopsis, and human.
 4. A method according to claim 1 or 2 whereinsaid cofactor specificity of said variant TR is NADPH or NADH.
 5. Amethod according to claim 1 or 2 wherein said cofactor specificity ofsaid variant TR is switched to NADH.
 6. A method according to claim 1 or2 wherein said cofactor specificity of said variant TR is altered suchthat said variant preferentially binds NADPH compared to NADH.
 7. Amethod according to claim 1 or 2 wherein said cofactor specificity ofsaid variant TR is altered such that said variant preferentially bindsNADH compared to NADPH.
 8. A method according to claim 1 or 2 whereinsaid cofactor specificity of said variant TR is altered such that saidvariant exhibits improved catalytic efficiency for NADPH as compared toa wild-type TR protein.
 9. A method for altering the substratespecificity of thioredoxin reductase comprising: a) inputting a set ofcoordinates for a thioredoxin reductase scaffold protein comprisingamino acid positions, b) applying at least one protein design cycle; andc) generating a set of candidate variant proteins with altered substratespecficity.
 10. A variant thioredoxin reductase (TR) protein accordingto claim 9 wherein said variant TR protein reduces a thioredoxin proteinobtained from an organism selected form the group consisting of E. coli,Bacillus subtillis, Mycobacterium leprae, Sarccharomyces, Neurosporacrassa, Arabidopsis, and human.
 11. A variant TR protein according toclaim 1 or 2, wherein said variant protein is fused to a second protein,wherein said second protein is either a wild-type TR protein,thioredoxin, or a variant TR protein.
 12. The variant TR proteinaccording to claim 11, wherein said variant protein is fused to saidsecond protein through a linker.
 13. A variant TR protein according toclaim 1 or 2 wherein said wherein said variant TR protein has from 1 to3 amino acid substitutions as compared to the wild-type Arabidopsis TRprotein.
 14. A variant TR protein according to claim 13 wherein saidamino acid substitutions are selected from positions A4, A5 and A6. 15.A variant TR protein according to claim 14 wherein said amino acidsubstitutions are selected from the group of substitutions consisting ofRA4W, RA5L, RA5M, RA5I, RA5F, RA5V, RA5Y, RA6T, RA6S, RA6Q, RA6G, andRA6N.
 16. A variant TR protein according to claim 15 comprising theamino acid substitutions RA4W and RA6T.
 17. A variant TR proteinaccording to claim 15 comprising the amino acid substitutions RA4W,RA5L, and RA6S.
 18. A variant TR protein according to claim 15comprising the amino acid substitutions RA5Y and RA6N.
 19. A variant TRprotein according to claim 15 comprising the amino acid substitutionsRA4W, RA5F, and RA6Q.
 20. A method for altering the cofactor specificityof target protein comprising: a) inputting a set of coordinates for ascaffold protein comprising amino acid positions, b) applying at leastone protein design cycle; and c) generating a set of candidate variantproteins with altered cofactor specificity.
 21. A method according toclaim 1, 2, 9 or 20 wherein said protein design cycle comprises proteindesign automation (PDA™).
 22. A method according to claim 1, 2, 9 or 20wherein said protein design cycle comprises the sequence predictionalgorithm.
 23. A method according to claim 1, 2, 9 or 20 wherein saidprotein design cycle comprises a force field calculation.
 24. A variantthioredoxin reductase (TR) protein comprising an isolated polypeptidemolecule of Formula I S₁-A₁-A₂-S₂-A₃-A₄-A₅-S₃-A₆-S₄  (I) wherein a) S₁comprises a polypeptide sequence selected from the group consisting ofSEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ IDNO:6, and SEQ ID NO:7, or a sequence having substantial similaritythereto; b) S₂ comprises a polypeptide sequence selected from the groupconsisting of SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQID NO:12, SEQ ID NO:13, and SEQ ID NO:14, or a sequence havingsubstantial similarity thereto; c) S₃ comprises a polypeptide sequenceselected from the group consisting of SEQ ID NO:15, SEQ ID NO:16, SEQ IDNO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, and SEQ ID NO:21, or asequence having substantial similarity thereto; d) S₄ comprises apolypeptide sequence selected from the group consisting of SEQ ID NO:22,SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27,and SEQ ID NO:28, or a sequence having substantial similarity thereto;e) A₁ is an amino acid moiety selected from the group consisting ofserine, valine, glycine, alanine, leucine, isoleucine, methionine,phenylalanine, and tryptophan; f) A₂ is an amino acid moiety selectedfrom the group consisting of alanine, glycine, valine, leucine,isoleucine, methionine, phenylalanine, and tryptophan; g) A₃ is an aminoacid moiety selected from the group consisting of histidine, asparticacid, glutamic acid, arginine, leucine, serine, threonine, cysteine,asparagine, glutamine, and tyrosine; h) A₄ is an amino acid moietyselected from the group consisting of arginine, alanine, glycine,valine, leucine, isoleucine, methionine, phenylalanine, and tryptophan;i) A₅ is an amino acid moiety selected from the group consisting ofarginine, asparagine, glutamine, aspartic acid, glutamic acid, cysteine,serine, threonine, and lysine; j) A₆ is an amino acid moiety selectedfrom the group consisting of arginine, glutamic acid, asparagine,glutamine, aspartic acid, cysteine, serine, threonine, and lysine;provided that at least A₁ is not serine; A₂ is not alanine; A₃ is nothistidine; A₄ is not arginine; A₅ is not arginine; or A₆ is notarginine.
 25. The polypeptide molecule according to claim 24, wherein S₁consists of a polypeptide sequence having the sequence selected from thegroup consisting of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4,SEQ ID NO:5, SEQ ID NO:6, and SEQ ID NO:7.
 26. The polypeptide moleculeaccording to claim 24, wherein S₂ consists of a polypeptide sequencehaving the sequence selected from the group consisting of SEQ ID NO:8,SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, andSEQ ID NO:14.
 27. The polypeptide molecule according to claim 24,wherein S₃ consists of a polypeptide sequence having the sequenceselected from the group consisting of SEQ ID NO:15, SEQ ID NO:16, SEQ IDNO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, and SEQ ID NO:21. 28.The polypeptide molecule according to claim 24, wherein S₄ consists of apolypeptide sequence having the sequence selected from the groupconsisting of SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25,SEQ ID NO:26, SEQ ID NO:27, and SEQ ID NO:28.
 29. The polypeptidemolecule according to claim 24, wherein S₁ is the polypeptide sequenceset forth in SEQ ID NO:1, S₂ is the polypeptide sequence set forth inSEQ ID NO:8, S₃ is the polypeptide sequence set forth in SEQ ID NO:15,and S₄ is the polypeptide sequence set forth in SEQ ID NO:22
 30. Thepolypeptide molecule according to claim 24, wherein S₁ is thepolypeptide sequence set forth in SEQ ID NO:2, S₂ is the polypeptidesequence set forth in SEQ ID NO:9, S₃ is the polypeptide sequence setforth in SEQ ID NO:16, and S₄ is the polypeptide sequence set forth inSEQ ID NO:23.
 31. The polypeptide molecule according to claim 24,wherein S₁ is the polypeptide sequence set forth in SEQ ID NO:3, S₂ isthe polypeptide sequence set forth in SEQ ID NO:10, S₃ is thepolypeptide sequence set forth in SEQ ID NO:17, and S₄ is thepolypeptide sequence set forth in SEQ ID NO:24.
 32. The polypeptidemolecule according to claim 24, wherein S₁ is the polypeptide sequenceset forth in SEQ ID NO:4, S₂ is the polypeptide sequence set forth inSEQ ID NO:11, S₃ is the polypeptide sequence set forth in SEQ ID NO:18,and S₄ is the polypeptide sequence set forth in SEQ ID NO:25.
 33. Thepolypeptide molecule according to claim 24, wherein S₁ is thepolypeptide sequence set forth in SEQ ID NO:5, S₂ is the polypeptidesequence set forth in SEQ ID NO:12, S₃ is the polypeptide sequence setforth in SEQ ID NO:19, and S₄ is the polypeptide sequence set forth inSEQ ID NO:26.
 34. The polypeptide molecule according to claim 24,wherein S₁ is the polypeptide sequence set forth in SEQ ID NO:6, S₂ isthe polypeptide sequence set forth in SEQ ID NO:13, S₃ is thepolypeptide sequence set forth in SEQ ID NO:20, and S₄ is thepolypeptide sequence set forth in SEQ ID NO:27.
 35. The polypeptidemolecule according to claim 24, wherein S₁ is the polypeptide sequenceset forth in SEQ ID NO:7, S₂ is the polypeptide sequence set forth inSEQ ID NO:14, S₃ is the polypeptide sequence set forth in SEQ ID NO:21,and S₄ is the polypeptide sequence set forth in SEQ ID NO:28.
 36. Thepolypeptide molecule according to claim 24, wherein A₁ is an amino acidmoiety selected from the group consisting of valine, alanine, andleucine.
 37. The polypeptide molecule according to claim 24, wherein A₂is an amino acid moiety selected from the group consisting of glycine,valine, and leucine.
 38. The polypeptide molecule according to claim 24,wherein A₃ is an amino acid moiety selected from the group consisting ofaspartic acid, glutamic acid, asparagine, and glutamine.
 39. Thepolypeptide molecule according to claim 24, wherein A₄ is an amino acidmoiety selected from the group consisting of alanine, glycine, valine,leucine, isoleucine, and methionine.
 40. The polypeptide moleculeaccording to claim 24, wherein A₅ is an amino acid moiety selected fromthe group consisting of asparagine, glutamine, aspartic acid, andglutamic acid.
 41. The polypeptide molecule according to claim 24,wherein A₆ is an amino acid moiety selected from the group consisting ofglutamic acid, glutamine, aspartic acid, and asparagine.
 42. Thepolypeptide molecule according to claim 24, wherein A₁ is valine. 43.The polypeptide molecule according to claim 24, wherein A₂ is glycine.44. The polypeptide molecule according to claim 24, wherein A₃ isaspartic acid.
 45. The polypeptide molecule according to claim 24,wherein A₄ is alanine.
 46. The polypeptide molecule according to claim24, wherein A₅ is asparagine.
 47. The polypeptide molecule according toclaim 24, wherein A₆ is glutamic acid.
 48. The polypeptide moleculeaccording to claim 24, wherein said molecule reduces a thioredoxinprotein obtained from an organism selected from the group consisting ofE. coli, Bacillus subtillis, Mycobacterium leprae, Sarccharomyces,Neurospora crassa, Arabidopsis, and Human.
 49. The polypeptide moleculeaccording to claim 24, wherein said reduction of thioredoxin takes placein the presence of a co-factor.
 50. The polypeptide molecule accordingto claim 24, wherein said co-factor is NADPH or NADH.
 51. Thepolypeptide molecule according to claim 24, wherein said co-factor isNADH.
 52. The polypeptide molecule according to claim 24, wherein saidpolypeptide shows greater than 100 times more affinity for NADPH thanfor NADH.
 53. The polypeptide molecule according to claim 24, whereinsaid polypeptide shows greater than 50 times more affinity for NADPHthan for NADH.
 54. The polypeptide molecule according to claim 24,wherein said polypeptide shows greater than 25 times more affinity forNADPH than for NADH.
 55. The isolated polypeptide molecule according toclaim 24, wherein said polypeptide is fused to a second polypeptide. 56.The polypeptide molecule according to claim 55, wherein said polypeptideis fused to said second polypeptide through a linker.
 57. Thepolypeptide molecule according to claim 56, wherein said linkercomprises a polypeptide sequence having between about 5 and about 50amino acids.
 58. The polypeptide molecule according to claim 56, whereinsaid linker comprises a polypeptide sequence having between about 10 andabout 40 amino acids.
 59. The polypeptide molecule according to claim56, wherein said linker comprises a polypeptide sequence having betweenabout 15 and about 25 amino acids.
 60. The polypeptide moleculeaccording to claim 56, wherein said second polypeptide is thioredoxin.61. The polypeptide molecule according to claim 56, wherein saidpolypeptide is further fused to a third polypeptide.
 62. The polypeptidemolecule according to claim 56 wherein said polypeptide is fused to saidthird polypeptide through a linker.
 63. The polypeptide moleculeaccording to claim 56 or 62, wherein said linker comprises a polypeptidesequence having a molecular weight between about 5 and about 100 kDa.64. The polypeptide molecule according to claim 56 or 62, wherein saidlinker comprises a polypeptide sequence having a molecular weightbetween about 20 and about 70 kDa.
 65. The polypeptide moleculeaccording to claim 56 or 62, wherein said linker comprises a polypeptidesequence having a molecular weight beween about 25 and about 45 kDa. 66.The polypeptide molecule according to claim 56 or 62, wherein said thirdpolypeptide is oleosin.
 67. A method for producing a plant with an amodified TR protein comprising: (a) introducing an expression cassettecomprising a promoter functional in a plant cell operably linked to aDNA molecule encoding a modified thioreduxin reductase (TR) proteinaccording to claim 1 or 22 comprising an amino terminal chloroplasttransit peptide, into the cells of a plant so as to yield transformedplant cells; and (b) regenerating said transformed plant cells toprovide a differentiated transformed plant, wherein expression of theDNA molecule encoding the modified TR protein in said plant alters theco-factor specificity compared to the untransformed plant.
 68. A methodaccording to claim 67 wherein said transformed plant expresses amodified TR protein wherein said cofactor specificity is NADPH or NADH.69. A method according to claim 67wherein said transformed plantexpresses a modified TR protein wherein said cofactor specificity isswitched to NADH.
 70. A method according to claim 67wherein saidtransformed plant expresses a modified TR protein wherein said cofactorspecificity is altered such that said modified TR protein preferentiallybinds NADPH compared to NADH.
 71. A method according to claim 1 or 2wherein said transformed plant expresses a modified TR protein whereinsaid cofactor specificity is altered such that said modified TR proteinexhibits improved catalytic efficiency for NADPH as compared to awild-type TR protein in an untransformed plant.
 72. A transformed plantprepared by the method of claim
 67. 73. A transformed seed of saidtransformed plant of claim
 72. 74. A method for making oil bodiescomprising a modified thioreduxin reductase (TR) protein comprising: a)producing in a cell a modified TR protein according to claim 1 or 2; b)associating said modified TR protein with oil bodies through an oil bodytargeting protein capable of associating with modified TR protein andsaid oil bodies; and, c) obtaining said oil bodies associated with saidmodified TR protein.
 75. A method according to claim 74 furthercomprising: a) washing the oil bodies to obtain a washed oil bodypreparation comprising said modified TR protein; and, b) formulatingsaid washed oil bodies into an emulsion.
 76. A method according to claim74 wherein said oil bodies are used in the preparation of non-allergenicfoods.
 77. A method according to claim 74 wherein said oil bodies areused in the preparation of animal feeds to improve the digestibility ofsaid feeds.